Re: [apache/incubator-mxnet] [RFC] MXNet 2.0 API Deprecation (#17676)

2020-06-10 Thread Haibin Lin
Drop the following loss operators since they are used with Module API:
- mx.symbol.LinearRegressionOutput
- mx.symbol.MAERegressionOutput
- mx.symbol.LogisticRegressionOutput
- mx.symbol.SVMOutput
- mx.symbol.SoftmaxOutput


-- 
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17676#issuecomment-642389680

Re: [apache/incubator-mxnet] [RFC] Apache MXNet 2.0 Roadmap (#16167)

2020-03-28 Thread Haibin Lin
@lilongyue the integration of bytePS to mxnet is in this PR 
https://github.com/apache/incubator-mxnet/pull/17555

-- 
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16167#issuecomment-605494475

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-10 Thread Haibin Lin
Shall we provide pip wheels for later release votes?

Not everyone knows how to build MXNet from source (and building from source
also takes very long). Providing a pip wheel would lower the bar for users
who wants to test MXNet and participate in voting.

Best,
Haibin

On Fri, Jan 10, 2020 at 3:50 PM Haibin Lin  wrote:

> +1
>
> Built from source with USE_CUDA=1 on Ubuntu. Run gluon-nlp unit tests and
> they passed.
>
> On Fri, Jan 10, 2020 at 3:18 PM Karan Jariwala 
> wrote:
>
>> +1
>>
>> Tested MXNet with and without MKL-DNN on Ubuntu 16.04 with Horovod 0.18.2.
>> No regression seen between 1.5.1 and 1.6.0.rc1 when running horovod_MXNet
>> integration test.
>>
>>
>> Thanks,
>>
>> Karan
>>
>> On Fri, Jan 10, 2020 at 2:47 PM Markus Weimer  wrote:
>>
>> > +1 (binding)
>> >
>> > I tested on Ubuntu 18.04 on the Windows Subsystem for Linux.
>> >
>> > Tested:
>> >   * Built from source using the instructions here [0]
>> >   * Ran the tests in `./build/tests/mxnet_unit_tests`
>> >   * SHA512 of the archive
>> >
>> > Not tested:
>> >   * Language bindings
>> >   * CUDA or other GPU acceleration
>> >   * LICENSE and compliance status
>> >   * Signature of the archive
>> >
>>
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc1

2020-01-10 Thread Haibin Lin
+1

Built from source with USE_CUDA=1 on Ubuntu. Run gluon-nlp unit tests and
they passed.

On Fri, Jan 10, 2020 at 3:18 PM Karan Jariwala 
wrote:

> +1
>
> Tested MXNet with and without MKL-DNN on Ubuntu 16.04 with Horovod 0.18.2.
> No regression seen between 1.5.1 and 1.6.0.rc1 when running horovod_MXNet
> integration test.
>
>
> Thanks,
>
> Karan
>
> On Fri, Jan 10, 2020 at 2:47 PM Markus Weimer  wrote:
>
> > +1 (binding)
> >
> > I tested on Ubuntu 18.04 on the Windows Subsystem for Linux.
> >
> > Tested:
> >   * Built from source using the instructions here [0]
> >   * Ran the tests in `./build/tests/mxnet_unit_tests`
> >   * SHA512 of the archive
> >
> > Not tested:
> >   * Language bindings
> >   * CUDA or other GPU acceleration
> >   * LICENSE and compliance status
> >   * Signature of the archive
> >
>


Re: Stopping nightly releases to Pypi

2020-01-04 Thread Haibin Lin
I was trying the nightly builds, but none of them is available:

pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-01/dist/mxnet_cu100-1.6.0b20200101-py2.py3-none-manylinux1_x86_64.whl
--user
pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-02/dist/mxnet_cu100-1.6.0b20200102-py2.py3-none-manylinux1_x86_64.whl
--user
pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
--user
pip3 install
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-04/dist/mxnet_cu100-1.6.0b20200104-py2.py3-none-manylinux1_x86_64.whl
--user

ERROR: Could not install requirement mxnet-cu100==1.6.0b20200103 from
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
because of HTTP error 404 Client Error: Not Found for url:
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl
for URL
https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2020-01-03/dist/mxnet_cu100-1.6.0b20200103-py2.py3-none-manylinux1_x86_64.whl

Please let me know if I typed wrong URLs.

1. The discoverability of available nightly builds needs improvement. If
someone can help write a script to list all links that exist, that would be
very helpful.
2. If any nightly build is not built successfully, how do the community
know the reason of the failure, and potentially offer helps? Currently I
don't have much visibility of the nightly build status.

Best,
Haibin


On Fri, Jan 3, 2020 at 5:47 PM Pedro Larroy 
wrote:

> Just to clarify, the current CI is quite an overhead to maintain for
> several reasons, this complexity is overkill for CD. Jenkins also has
> constant plugin upgrades, security vulnerabilities, has to be restarted
> from time to time as it stops working... and to make binary builds from an
> environment which runs unsafe code, I don't think is good practice. So for
> that, having a separate Jenkins, CodeBuild, Drone or using a separate
> Jenkins node is the right solution. Agree with you that is just a
> scheduler, but somebody is making efforts to keep it running. If you have
> the appetite and resources to duplicate it for CD please go ahead.
>
> On Fri, Jan 3, 2020 at 3:25 PM Marco de Abreu 
> wrote:
>
> > Regarding your point of finding somebody to maintain the solution: At
> > Apache we usually retire things if there's no maintainer, since that
> > indicates that the feature/system is not of enough interest to warrant
> > maintenance - otherwise, someone would step up.
> >
> > While assistance in the form of a fix is always appreciated, the fix
> still
> > has to conform with the way this project and Apache operates. Next time
> I'd
> > recommend to contribute time on improving the existing community solution
> > instead of developing an internal system.
> >
> > -Marco
> >
> > Marco de Abreu  schrieb am Sa., 4. Jan. 2020,
> > 00:21:
> >
> > > Sam, while I understand that this solution was developed out of
> > necessity,
> > > my question why a new system has been developed instead of fixing the
> > > existing one or adapting the solution. CodeBuild is a scheduler in the
> > same
> > > fashion as Jenkins is. It runs code. So you can adapt it to Jenkins
> > without
> > > much hassle.
> > >
> > > I'm not volunteering for this - why should I? The role of a PMC member
> is
> > > to steer the direction of the project. Just because a manager points
> > > towards a certain direction, if doesn't mean that they're going to do
> it.
> > >
> > > Apparently there was enough time at some point to develop a new
> solution
> > > from scratch. It might have been a solution for your internal team and
> > > that's fine, but upgrading it "temporarily" to be the advertised way on
> > the
> > > official website is something different.
> > >
> > > I won't argue about how the veto can be enforced. I think it's in the
> > best
> > > interest of the project if we try working on a solution instead of
> > spending
> > > time on trying to figure out the power of the PMC.
> > >
> > > Pedro, that's certainly a step towards the right direction. But
> > committers
> > > would also need access to the control plane of the system - to trigger,
> > > stop and audit builds. We could go down that road, but i think the
> fewer
> > > systems, the better - also for the sake of maintainability.
> > >
> > > Best regards,
> > > Marco
> > >
> > >
> > >
> > > Pedro Larroy  schrieb am Fr., 3. Jan.
> > 2020,
> > > 20:55:
> > >
> > >> I'm not involved in such efforts, but one possibility is to have the
> > yaml
> > >> files that describe the pipelines for CD in the Apache repositories,
> > would
> > >> that be acceptable from the Apache POV? In the end they should be very
> > >> thin
> > >> and calling the scripts that are part of the CD packages.
> > >>
> > >> On Fri, Jan 3, 2020 at 6:56 AM Marco de 

Re: Stopping nightly releases to Pypi

2019-12-16 Thread Haibin Lin
Shall we update the website installation page with nightly build
information as well (after we figure out the CD details)?

Best,
Haibin

On Tue, Dec 10, 2019 at 10:15 PM Lausen, Leonard 
wrote:

> Not yet. As a community, we first need to add the nightly build hosting
> feature
> to the community run CD and then we can add the page so that the exact date
> doesn't need to be specified.
>
> I'm not sure what steps are required for this. Do we need to host the
> artifacts
> on Apache's infrastructure? Or can we host the nightly CD artifacts as
> part of
> the AWS sponsored community-maintained CD (S3 bucket associated to the
> account)?
>
> In the meantime, the "proprietary" AWS build solution could be extended to
> publish an html page per artifact type (mxnet, mxnet-cu100, ...)
> containing a
> link to all recent builds.
>
> Best regards
> Leonard
>
> On Tue, 2019-12-10 at 22:03 -0800, Lin Yuan wrote:
> > Is there a way to install the latest nightly package without having to
> > specify exact date?
> >
> > Thanks,
> >
> > Lin
> >
> > On Sun, Dec 8, 2019 at 6:13 PM Lausen, Leonard  >
> > wrote:
> >
> > > From Shanghai, the closest endpoint (automatically chosen endpoint) is
> in
> > > Tokyo
> > > and download speed for mxnet-mkl was on average 1.7 MB/s with a
> maximum of
> > > 5
> > > MB/s during my test.
> > >
> > > On Sun, 2019-12-08 at 01:30 +, Sheng Zha wrote:
> > > > > Heres a set of links for today’s builds
> > > > >
> > > > > (Plain mxnet, no mkl no cuda)
> > > > >
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > (mxnet-mkl)
> > > > >
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > (mxnet-cuXXX)
> > > > >
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > (mxnet-cuXXXmkl)
> > > > >
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > These links are not utilizing the s3 accelerate feature (i.e. not
> backed
> > > by
> > > > cloudfront edges). Please use repo.mxnet.io instead. The updated
> links
> > > are:
> > > > (Plain mxnet, no mkl no cuda)
> > > >
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > (mxnet-mkl)
> > > >
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > (mxnet-cuXXX)
> > > >
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > (mxnet-cuXXXmkl)
> > > >
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > >
> https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > When updating the installation doc we should use repo.mxnet.io
> domain
> > > name
> > > > too.
> > > >
> > > > Best,
> > > > -sz
> > > >
> > > > On 2019/12/07 17:39:40, "Skalicky, Sam" 
> > > wrote:
> > > > > Hi MXNet Community,
> > > > >
> > > > > We have been working on getting nightly builds fixed and made
> available
> > > > > again. We’ve made another system using AWS CodeBuild & S3 to work
> > > around the
> > > > > problems with Jenkins CI, PyPI, etc. It is currently building all
> the
> > > > > flavors and publishing to an S3 

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-12-07 Thread Haibin Lin
I do expect the API to change in the future. Currently @szhengac @zhongyuchen 
and I are exploring APIs for gradient compression with a few algorithms, and we 
may bring back the best practice back to MXNet. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16795#issuecomment-562907768

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Haibin Lin
I did mean use case 2,3,4. 
Initialization is done in the constructor `kv.__init__()`, and for horovod it 
could be simply a `hvd.init()` call. 

I have not discussed problem 1 for too much details. horovod uses mpirun to 
setup connection and launch processes, while byteps/p3 and native kvstore 
currently use the `dmlc/launcher` script. I do see that `dmlc/launcher` has mpi 
support, but I need to play more with it to see if it fits existing use cases. 
But I don't see fundamental blockers for (1). 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16795#issuecomment-553089601

[apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Haibin Lin
## Background  
Data parallel training is the most common distributed training technique when 
it comes to multiple GPUs or multiple hosts. Currently, several communication 
backends provide functionalities for communicating tensors across devices/hosts 
for data parallel training. For MXNet users, there are a few options: 
1. native kvstore
2. [p3 kvstore](https://www.sysml.cc/doc/2019/75.pdf)
3. [horovod](https://github.com/horovod/horovod/) 
4. [bytePS](https://github.com/bytedance/byteps/)

These different implementations provide different APIs:
- native kvstore
  - high level APIs: `mx.gluon.Trainer`
  - low level APIs: `kv.push`, `kv.pull`, `kv.init`
- horovod
  - high level APIs: `hvd.init()`, `hvd.DistributedTrainer`
  - low level APIs: `hvd.broadcast`, `hvd.allreduce`
- bytePS
  - high level APIs: `bps.init()`, `bps.DistributedTrainer`
  - low level APIs: `byteps_declare_tensor`, `byteps_push_pull`

Here, high level APIs refers to the API a typical novice user uses for a 
distributed training job. To communicate tensors not managed by `Trainer` or 
`DistributedTrainer`s, users may refer to the low level APIs to send/receive a 
custom tensor. 

## Problem Statement
Sometimes we want to easily switch between these different distributed 
communication backends and compare which one performs the best for a particular 
distributed training environment. Due to different APIs of these 
implementations, it requires lots of user code changes to try each one of them. 
It typically involves custom logics to:
1. launch python processes for distributed training job ([BytePS 
launch](https://github.com/bytedance/byteps/blob/master/docs/step-by-step-tutorial.md#mxnet)
 v.s. [horovod](https://github.com/horovod/horovod#running-horovod)) 
2. initialize communication backends ([example 
code](https://github.com/eric-haibin-lin/gluon-nlp/blob/benchmark/scripts/bert/run_pretraining.py#L187-L228))
3. create (Distributed)Trainers ([example 
code](https://github.com/eric-haibin-lin/gluon-nlp/blob/benchmark/scripts/bert/run_pretraining.py#L297-L303))
4. send custom tensors ([example 
code](https://github.com/eric-haibin-lin/gluon-nlp/blob/benchmark/scripts/bert/run_pretraining.py#L582-L586))

## Proposal 

My proposal is to provide a unified API to allow custom communication backends 
as plugins for MXNet, so that no new user code is required to switch between 
these backends.

Specifically, communication backend provider implements the following python 
APIs.

class `AbstractKVStore`:
- def __init__(): initialization
- def broadcast(name, tensor, root_rank): broadcast the `tensor` at `root_rank` 
to all ranks
  - name: tensor name. int or str
  - tensor: ndarray 
- def push_pull(name, tensor, output): push `tensor` and pull in `output`. When 
optimizer is not set, it performs summation of `tensor` from all ranks. The 
result of the summation is then pulled back to `output` tensor. 
  - name: tensor name. int or str
  - tensor: ndarray to push
  - output: ndarray to store the output of pull 
- def set_optimizer(optimizer): set the optimizer at parameter servers. 
Optional interface, only used for parameter server based backends. 
  - optimizer: mx.optimizer.Optimizer

A communication backend provider can implement these APIs and register a new 
KVStore in MXNet via `mx.kv.register()`. For MXNet users, they only need to 
interact with the following MXNet APIs:
- using high level APIs for a typical data parallel model training
```
backend = mx.kv.create('horovod')
trainer = mx.gluon.Trainer(kv=backend)
# forward: loss = net(data)
# backward: loss.backward()
# update: trainer.step()
```
- using low level APIs to reduce a custom tensor
```
kv.broadcast("name", custom_ndarray, root_rank=0)
kv.push_pull("name", custom_ndarray, out=custom_ndarray)
```

## Limitation 

The unified interfaces do not advanced features such as sparse ndarrays or 
gradient compression, which is less mature and not provided by all 
communication backends. 

The above proposal targets use case 2,3,4 in the problem statement. It can be 
extended to tackle 1 as well if the feedbacks are positive. 

@ymjiang @apeforest @anandj91 @rich-junwang 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/16795

Re: new website, docs code freeze

2019-10-02 Thread Haibin Lin
cuss forum.
> > >
> > > Good hunting!
> > >
> > > Thomas
> > >
> > >
> > >
> > > Le mer. 25 sept. 2019 à 10:10, Marco de Abreu  >
> > a
> > > écrit :
> > >
> > >> Good catch, Mu! Also good idea, Philip!
> > >>
> > >> Aaron and Thomas, are you going to work on this?
> > >>
> > >> -Marco
> > >>
> > >> On Wed, Sep 25, 2019 at 1:28 AM Mu Li  wrote:
> > >>
> > >> > The questions I found are:
> > >> >
> > >> > 1. Not ever page contains, especially the homepage
> > >> >
> > >> >
> > >>
> >
> http://mxnet.incubator.apache.org/api/python/docs/_static/google_analytics.js
> > >> > 2. The correct tracking id is UA-96378503-1 instead of
> UA-96378503-11
> > in
> > >> >
> > >> >
> > >>
> >
> http://mxnet.incubator.apache.org/api/python/docs/_static/google_analytics.js
> > >> >
> > >> > On Tue, Sep 24, 2019 at 4:23 PM Mu Li  wrote:
> > >> >
> > >> > > I think the reason is that the google tracker is not included in
> the
> > >> new
> > >> > > website.
> > >> > >
> > >> > > On Tue, Sep 24, 2019 at 4:17 PM Marco de Abreu <
> > >> marco.g.ab...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > >> Hello,
> > >> > >>
> > >> > >> I checked the Google Analytics statistics and the launch of the
> new
> > >> > >> website reduced the traffic by over 80%:
> > >> > >>
> > >> > >> [image: image.png]
> > >> > >>
> > >> > >> (Please let me know if the image is not visible)
> > >> > >>
> > >> > >> How shall we handle this?
> > >> > >>
> > >> > >> Best regards,
> > >> > >> Marco
> > >> > >>
> > >> > >> On Mon, Sep 23, 2019 at 7:30 AM Zhao, Patric <
> > patric.z...@intel.com>
> > >> > >> wrote:
> > >> > >>
> > >> > >>> For the install page [1], I suggest to add the selection of
> > backend
> > >> > >>> DeepNumpy [2] which will be more clean.
> > >> > >>>
> > >> > >>> [1] http://mxnet.incubator.apache.org/index.html
> > >> > >>> [2] https://numpy.mxnet.io/#installation
> > >> > >>>
> > >> > >>>
> > >> > >>>
> > >> > >>> > -Original Message-
> > >> > >>> > From: kellen sunderland 
> > >> > >>> > Sent: Monday, September 23, 2019 12:47 PM
> > >> > >>> > To: dev@mxnet.incubator.apache.org
> > >> > >>> > Subject: Re: new website, docs code freeze
> > >> > >>> >
> > >> > >>> > New site looks good.  I do notice that a few tutorials from
> the
> > >> old
> > >> > >>> site are
> > >> > >>> > missing (for example the TensorRT tutorial).  Any plans to
> bring
> > >> them
> > >> > >>> back?
> > >> > >>> >
> > >> > >>> > On Sun, Sep 22, 2019 at 10:04 AM Haibin Lin <
> > >> > haibin.lin@gmail.com>
> > >> > >>> > wrote:
> > >> > >>> >
> > >> > >>> > > Another issue I found with the current website: the Sphinx
> > >> object
> > >> > >>> > > inventory
> > >> > >>> > > <https://www.sphinx-
> > >> > >>> > doc.org/en/master/usage/extensions/intersphinx.htm
> > >> > >>> > > l> file https://mxnet.apache.org/objects.inv is missing.
> > >> GluonNLP
> > >> > >>> > > relies on this file to link document across projects. Shall
> we
> > >> add
> > >> > it
> > >> > >>> > > back?
> > >> > >>> > >
> > >> > >>> > > Best,
> > >> > >>> > > Haibin
> > >> > >>> > >
> >

Re: new website, docs code freeze

2019-09-22 Thread Haibin Lin
Another issue I found with the current website: the Sphinx object inventory
<https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html>
file https://mxnet.apache.org/objects.inv is missing. GluonNLP relies on
this file to link document across projects. Shall we add it back?

Best,
Haibin

On Sun, Sep 22, 2019 at 2:04 AM Lieven Govaerts  wrote:

> Hi,
>
>
> On Sat, 21 Sep 2019 at 06:28, Thomas DELTEIL 
> wrote:
>
> > Thanks all for the feedback,
> >
> > We'll send an email next week with the list of missing features, content
> > and bugs that we plan to fix.
> > We took the option of releasing early, with some features missing, rather
> > than trying to be at feature parity with the old website before launching
> > the website.
> > The reason why we decided to do that is two-fold:
> > - playing catch-up with docs in master introduce daily conflicts that
> need
> > to be resolved and introduce opportunity for errors
> > - by releasing early, we can take advantage of the community
> contributions
> > in modifying whatever the community feels like a better way of doing
> > things.
> >
> > One of the goals of the new website was to disentangle the main website,
> > now called "static_site" to the auto-generated docs. Now the overall site
> > is made of a main static site, with easy to modify content and easy to
> > understand architecture for anybody familiar with basic html, and a
> > collection of mini-websites for each language bindings that can be built
> in
> > isolation and that are self-contained. Actually the new CI jobs builds
> all
> > of them in parallel independently.
> >
> > There is PLENTY of room for improvement, it would be great if the
> community
> > can help contribute to bring the new website at the same level of content
> > richness as the old one, and then even further.
> >
> > Missing features:
> > - As pointed by Haibin, the API docs do not have the full list of
> operators
> > and classes. There is a mix of auto-generated docs based on packages, and
> > some docs that are spelled out manually to improve the logical
> organization
> > of the package where there is a need. The drawback with manually listed
> > classes in a package is that it's very easy to miss some. If someone
> wanted
> > to build a sanity check that would automatically detect which classes are
> > not in the documentation, or if someone knew how to enable that with
> > sphinx, that would be a great addition to the python docs
> > - There is missing content in the python tutorials, and the
> discoverability
> > could be improved. Some old tutorials have not been migrated just yet.
> > - The nightly tests on tutorials have been disabled for now
> > - There is no "Download jupyter notebook" for tutorials just yet.
> > - Non-python tutorials might benefit from a blurb description and a
> better
> > content organization.
> > - Python tutorials could be better organized, have a picture accompanying
> > their description
> > - There is no site-wide search, this is not an easy problem to solve to
> be
> > fair given the static nature of the website, but maybe an external plugin
> > might be able to give a half-way solution
> > - There is no version selector for the docs
> > - There is bug in search box of the python docs, but this is just a small
> > JS bug that can be fixed easily (on my list for next week)
> > - Most old links have not had a redirect put in place.
> >
> >
> I noticed on the Ubuntu home page in the Developer dropdown that the link
> MXNet on Ubuntu <https://mxnet.incubator.apache.org/install/index.html
> >with
> Nvidia
> <
> https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/mxnet/
> >
> doesn't work anymore, it points to:
> https://mxnet.incubator.apache.org/install/index.html
>
> Also, on the MXNet 'getting started' page
> https://mxnet.incubator.apache.org/get_started , the link "Ubuntu
> Installation Guide" at the bottom doesn't work either, it points to:
> https://mxnet.incubator.apache.org/ubuntu_setup.html
>
>
> I suggest you do a scan of the new website to find these dangling links.
>
> regards,
>
> Lieven
>
>
>
> > We'll formalize this in github issues next week, but they are all fairly
> > small and helping out on these would be a great way of familiarizing
> > yourself with the new website build system and website architecture.
> >
> >  Thanks all for the feedback, please keep it coming!
> >
> > Thomas Delteil
> >
> > Le sam. 21 se

Re: new website, docs code freeze

2019-09-20 Thread Haibin Lin
It looks like my previous email did not go through. Re-sending:

Hi Aaron,

The website looks cool. Thanks for pushing this to production. A few
questions:

- I was looking for the API doc for mx.sym.dot, but I find that most
operators under mx.sym.* are missing. Is this expected?
- I was also checking the search functionality, searching the keyword
"ndarray" only returns one result "mxnet.ndarray.NDArray", which doesn't
seem right. There animation keeps going (Searching. -> Searching.. ->
Searching ...) and gives me an impression that the search is never
completely done(?).

Best,
Haibin


On Fri, Sep 20, 2019 at 4:50 PM Chaitanya Bapat 
wrote:

> Thanks Aaron and the team for launching new website!
>
> 1. There's no search button anywhere on the landing page.
> 2. I wasn't able to find FAQ (and without search button I dont have option
> but to go manually on each menu). Only when I go to Docs -> FAQ
> -> Extend and Cotribute (that I got what I wanted).
>
> Suggestions
> Might want to make this searchable and pop FAQ on the main page (or
> somewhere prominent)
>
> Thanks,
> Chai
>
>
> On Fri, 20 Sep 2019 at 14:58, Przemysław Trędak 
> wrote:
>
> > There seems to be a problem with (at least Python, did not check others)
> > APIs. For example this page:
> >
> >
> https://mxnet.incubator.apache.org/api/python/docs/api/symbol/_autogen/mxnet.symbol.Symbol.argmax.html
> >
> > says that it is a convenience method for argmax (with a link), but
> > clicking that link just points to the same website (and so user has no
> way
> > of getting to the docs of the actual operator).
> >
> > When I tried to manually remove Symbol from the URL to get to
> > mxnet.symbol.argmax.html, I got a "Not found" webpage which I guess also
> > should not happen (ignoring the fact that this should exist, going to
> > random URL under the website should redirect to the main page I think).
> >
> > Przemek
> >
> > On 2019/09/20 16:41:28, Lin Yuan  wrote:
> > > Looks very neat. Thank you Aaron and many others for launching this!
> > >
> > > On Fri, Sep 20, 2019 at 7:31 AM Carin Meier 
> > wrote:
> > >
> > > > Nice!!! Congrats everyone!
> > > >
> > > > On Fri, Sep 20, 2019 at 10:28 AM Aaron Markham <
> > aaron.s.mark...@gmail.com>
> > > > wrote:
> > > >
> > > > > Alrighty! The new site is launched. You might need to clear your
> > cache.
> > > > >
> > > > > Cheers,
> > > > > Aaron
> > > > >
> > > > > On Thu, Sep 19, 2019 at 3:33 PM Aaron Markham <
> > aaron.s.mark...@gmail.com
> > > > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks everyone. The PRs passed CI, but please continue holding
> > off on
> > > > > > docs and CI edits. Unless there are any objections, I'd like to
> > launch
> > > > > > the new website today.
> > > > > >
> > > > > > On Wed, Sep 18, 2019 at 7:46 AM Aaron Markham <
> > > > aaron.s.mark...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > Hi everyone,
> > > > > > > The last two PRs [1][2] for the new website and docs have
> passed
> > CI
> > > > > > > (finally). Please do not make changes to /docs or /ci until we
> > get
> > > > > > > these approved and merged. Every time there's a merge conflict
> > it has
> > > > > > > set us back a day or two while shepherding the PRs through CI
> > again.
> > > > > > > Unless there are catastrophic issues discovered in a review, I
> > > > > > > recommend that we hold any patches or updates to the PRs to
> > follow-up
> > > > > > > PRs.
> > > > > > >
> > > > > > > There are four steps to launch:
> > > > > > > 1. Once the PRs are approved, the plan is to merge 15885 to
> > delete
> > > > the
> > > > > > > old content first.
> > > > > > > 2. Then immediately merge 15883 to add in the new CI flows and
> > > > updates
> > > > > > > to the content Thomas and I have already had merged in 15884
> [3].
> > > > > > > 3. I will change the website validation Jenkins pipeline to
> > point to
> > > > > > > the new pipeline.
> > > > > > > 4. I will change the website publishing Jenkins pipeline to
> > point to
> > > > > > > its new pipeline as well. Once triggered, the old site will be
> > > > > > > replaced with the new one.
> > > > > > >
> > > > > > > Post launch we'll need to update the DNS for beta.mxnet.io to
> > point
> > > > to
> > > > > > > production, and there will likely be some redirect/.htaccess
> > updates
> > > > > > > needed next week to assist with any deep linking and 404 issues
> > that
> > > > > > > pop up.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Aaron
> > > > > > >
> > > > > > > [1] https://github.com/apache/incubator-mxnet/pull/15885
> > > > > > > [2] https://github.com/apache/incubator-mxnet/pull/15883
> > > > > > > [3] https://github.com/apache/incubator-mxnet/pull/15884
> > > > >
> > > >
> > >
> >
>
>
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
>
> [image: https://www.linkedin.com//in/chaibapat25]
> [image: https://www.facebook.com/chaibapat
> ]
> [image:
> 

Re: [Discuss] MXNet Python < 3.6 Support Deprecation

2019-08-24 Thread Haibin Lin
+1

On Thu, Aug 22, 2019 at 11:22 PM Junru Shao  wrote:

> +1 for 3.6+
>
> On Thu, Aug 22, 2019 at 8:54 AM Marco de Abreu 
> wrote:
>
> > +1 for 3.6+
> >
> > Yuan Tang  schrieb am Do., 22. Aug. 2019,
> 08:08:
> >
> > > +1 to target 3.6+
> > >
> > > On Thu, Aug 22, 2019 at 11:01 AM Leonard Lausen 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Pedro stated "Seems 3.6 is a reasonable choice." and there have been
> a
> > > > few +1 after Chaitanya's reply to Pedro. I would like to check if
> these
> > > > only refer to Chaitanya's mail about a dedicated "improvement" effort
> > or
> > > > about dropping 3.5.
> > > >
> > > > Thus two questions:
> > > >
> > > > 1) Are there any concerns about dropping Python 3.5? Now is your
> chance
> > > to
> > > > speak up if you think so.
> > > >
> > > > 2) Should new MXNet 1.x (experimental?) functionality (for example
> > numpy
> > > > compatible interface) only target the Python versions to be supported
> > in
> > > > MXNet 2? The current plan is to make many MXNet 2 features available
> as
> > > > "opt-in" in MXNet 1.x. Supporting older Python versions on MXNet 1
> for
> > > > these features may impact design and functionality and create
> > > > unnecessary technical debt.
> > > >
> > > >
> > > > Personally I argue for targeting only 3.6+ as
> > > > - 3.5 will go EOL in 388 days and a potential MXNet 2 release
> together
> > > >   with our Semantic Versioning backwards compatibility guarantees
> would
> > > >   keep us "stuck" on 3.5 for the years to come. JetBrains 2018 survey
> > > >   showed only 11% of users used 3.5.
> > > > - 3.6 introduced a number of fundamental and relevant changes that we
> > > >   may want to build on and for which we can expect user adoption to
> > > >   increase over the years (thus MXNet should try to be compatible).
> > > >   - "PEP 526: Syntax for variable annotations" which we may even be
> > able
> > > > to use for shape typing along the lines of numpy
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQqRHY/
> > > >   - asyncio module is stable with 3.6 and associated 3.7 language
> > > > features such as contextvars only have backports for 3.6. Some
> > parts
> > > > of Gluon currently rely on thread-local state, which is not
> correct
> > > > if users call MXNet from within asyncio code.
> > > >   Locking ourselves to 3.5 means we can't support these and may
> provide
> > > >   a bad user-experience in coming years.
> > > > - Part of the Ecosystem (GluonNLP) only support 3.6+ anyways.
> > > >
> > > > I would also like to cite James MacGlashan to point out how targeting
> > > > 3.6+ could help usability and attract more users:
> > > >
> > > >   Pipe dream: I'd love it if Mxnet not only dropped Python 2 support
> > for
> > > >   a more consistent design, but also went all in on Python 3.6 for
> type
> > > >   hint integration. There are enough different types involved in
> MXNet
> > > >   that types can help clarify usage, particularly for disambiguating
> > > >   symbol vs ndarray vs list vs tuple; tuple of ints rather than tuple
> > of
> > > >   floats; etc.
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/8703#issuecomment-520881450
> > > >
> > > > Thus we can see targeting 3.6+ as a great opportunity for the MXNet
> > > > project!
> > > >
> > > > Best regards
> > > > Leonard
> > > >
> > > > "Srivastava, Rohit Kumar" 
> writes:
> > > > > +1
> > > > >
> > > > > On 7/19/19, 12:59 PM, "Zhu Zhaoqi"  wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > Lin Yuan  于2019年7月19日周五 上午12:06写道:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Fri, Jul 19, 2019 at 12:03 AM Chaitanya Bapat <
> > > > chai.ba...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 definitely.
> > > > > > >
> > > > > > > Going forward,
> > > > > > > MXNet repo as it stands has ~95,000+ lines of Python code
> [1]
> > > > > > > OpenEdx has a million (10x) LOC and this mammoth effort of
> > > > porting from
> > > > > > > Python 2 to 3 is treated as a separate project named
> > > Incremental
> > > > > > > Improvement. [2]
> > > > > > > We can take inspiration from them and have a similar effort
> > by
> > > > calling
> > > > > > > action from the community. Issues can be maintained in a
> > > > separate JIRA
> > > > > > > board to track high priority tasks.
> > > > > > >
> > > > > > > Also, I can see gluon-nlp adding themselves to the Python3
> > > > statement.
> > > > > > Once
> > > > > > > the vote passes, one of us could submit a PR to add MXNet
> as
> > > > well.
> > > > > > >
> > > > > > > [1] https://codeclimate.com/
> > > > > > > [2]
> > > > > > >
> > > > > >
> > > >
> > >
> >
> https://open.edx.org/blog/python-2-is-ending-we-need-to-move-to-python-3/
> > > > > > >
> > > > > > >
> > > > > > > On Thu, 18 Jul 2019 at 21:39, Kshitij Kalambarkar <
> > > 

Fwd: ApacheCon Europe 2019: Join our Hackathon!

2019-08-24 Thread Haibin Lin
-- Forwarded message -
From: Sally Khudairi 
Date: Wed, Aug 21, 2019 at 9:23 AM
Subject: ApacheCon Europe 2019: Join our Hackathon!
To: 


Dear Apache Committers,

There will be a hackathon space at ApacheCon Europe 2019 in Berlin. It will
be available on 23rd/24th October from the start of each day’s schedule
until 7:00 PM, with the possible exceptions e.g. during keynotes.

We want to invite everybody to participate on the hackathon. Collaborative
development on project source code, improvements to project documentation,
and development of example apps or tools built upon one or more Apache
projects are all encouraged. Furthermore, it's a nice opportunity to get in
touch with other contributors.

More details will be available leading up to the event, but here’s
generally what to expect:

- Dedicated space with chairs, power, wifi, snacks, and caffeine.
- Tables dedicated to specific participating projects.
- ‘Getting Started’ discussions for new and aspiring committers
- (industrial) IoT Corner with industrial equipment to code against

With most logistical concerns now attended to, we are identifying
interested participants, promoters, and coordinators. Are you a PMC member
or committer willing to do the following on behalf of your project:

- Operate a dedicated hackathon table for some time
- Designate collaborative work for your project’s table to help hackers
focus
- Encourage your community to hack at your project’s table
- Spread the word throughout your community!

Interested? Email plann...@apachecon.com to reserve a table for some time
for your project that we can advertise to your community.

We are currently working out a Hackathon schedule that will be posted,
letting attendees know when they should expect activities related to their
project and various open sessions to take place. If you will take on the
lead role for your project, we will work with you to plan, prepare, and
generate interest in the wider community.

If you aren’t interested in hacking yourself but want to help others, we
could also use persons who are willing to do one or more of the following:

- Operate the primary information table in the hackathon space for
several hours
- Give a short introductory tools or skills presentation / answer questions

Thanks!
ACEU Hackathon Organization Team

P.S. We invite anyone who has ideas to share with the planning committee,
or is considering participating in any way, to let us know via
d...@community.apache.org


Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Haibin Lin
In GluonNLP we are testing with MXNET nightly build for each PR, and we did
find some MXNet related issue caught by the CI.
I recommend other toolkits also add integration tests with MXNet nightly.
It helps identify issues early.

Best,
Haibin

On Thu, Jun 20, 2019 at 18:52 Zhao, Patric  wrote:

> Thanks to raise the issue and we will take a look ASAP.
>
> The downstream cases is not in the MXNet CI so it's hard to catch the
> potential bugs or performance degradation for MXNet developers.
>
> In the future, I suggest adding the major downstream test cases, like from
> sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
> If it's still too heavy,  maybe testing it weekly or monthly :)
>
> Thanks,
>
> --Patric
>
> > -Original Message-
> > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > Sent: Friday, June 21, 2019 9:31 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: d...@mxnet.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> >
> > Hi Lai,
> >
> > I have opened an issue:
> > https://github.com/apache/incubator-mxnet/issues/15297
> > I came to know about this issue only today and I have not been monitoring
> > sockeye.
> > I jumped onto this issue to make sure it wasn't caused by the dlpack
> changes.
> > Also, I don't  think sockeye CI checks against master, it is using 1.4.1.
> >
> > Anirudh
> >
> >
> > On Thu, Jun 20, 2019 at 6:17 PM Lai Wei  wrote:
> >
> > > Hi,
> > >
> > > Could you share which test failed and what’s the crash? How to
> > > reproduce it?
> > >
> > > I was able to install sockeye and run all tests passed. Using python
> > > setup.py test
> > >
> > > I have tested both nightly pip package and 1.5.0.rc1
> > >
> > > It would be great to create an issue with reproducible steps and move
> > > the discussion there.
> > >
> > > Also I see sockeye nightly build[1] has been failing for some time, if
> > > it’s due to MXNet change, please raise this early so we can track and
> > > solve it in time rather than block the release during vote time.
> > >
> > > [1] https://travis-ci.org/awslabs/sockeye
> > >
> > >
> > > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> > >  > > >
> > > wrote:
> > >
> > > > I was able to reproduce a crash with the commit
> > > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > > >
> > > > Anirudh
> > > >
> > > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei  wrote:
> > > >
> > > > > Hi Przemyslaw,
> > > > >
> > > > > Is there an issue with more details to track the problem?
> > > > >
> > > > >
> > > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > -1
> > > > > >
> > > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > > occuring in
> > > > 1.5rc1. I
> > > > > > don't yet have the exact commit that is responsible for it, but
> > > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > > related) or
> > > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> > optimization).
> > > > > >
> > > > > > On 2019/06/20 06:36:22, Lai Wei  wrote:
> > > > > > > Dear MXNet community,
> > > > > > >
> > > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > > version
> > > > > > 1.5.0.
> > > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on
> > > June
> > > > > 22,
> > > > > > > 23:59:59.
> > > > > > >
> > > > > > > 1) Link to release notes:
> > > > > > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > > s
> > > > > > >
> > > > > > >
> > > > > > > 2) Link to release candidate:
> > > > > > >
> > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > > c1
> > > > > > >
> > > > > > >
> > > > > > > 3) Link to source and signatures on apache dist server:
> > > > > > >
> > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > > c1/
> > > > > > >
> > > > > > >
> > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > >
> > > > > > > +1 = approve
> > > > > > > +0 = no opinion
> > > > > > > -1 = disapprove (provide reason)
> > > > > > > --
> > > > > > > Best Regards
> > > > > > >
> > > > > > > Lai
> > > > > > >
> > > > > >
> > > > > --
> > > > > Best Regards
> > > > >
> > > > > Lai
> > > > >
> > > >
> > > --
> > > Best Regards
> > >
> > > Lai
> > >
>


Re: [DISCUSS] 1.5.0 Release Plan

2019-05-31 Thread Haibin Lin
Hi dev@,

Quick update on the gluonnlp issue. Lai and I worked together to test
gluonnlp and MXNet with different configurations, and found that the use of
GELU operator in fp16 is causing the divergence. It was a very recent
change in gluonnlp, and it can be avoided by reverting the change in
GluonNLP. This doesn't block 1.5 release anymore.

Best,
Haibin

On Thu, May 30, 2019 at 11:33 AM Lai Wei  wrote:

> Hi dev@,
>
> Quick update on the 1.5.0 release, all previous tracked PRs have been
> merged and CI is back to normal again, please rebase your PR.
> Again, I would like to encourage downstream projects to test against latest
> MXNet now to discover bugs and regressions early, really appreciate your
> help.
>
> We still have 3 new open issues/PRs to track:
> 1. Gluon NLP BERT training Haibin mentioned
> 2. https://github.com/apache/incubator-mxnet/pull/15039
> 3. https://github.com/apache/incubator-mxnet/pull/15097
>
> Thanks!
>
> Best Regards
>
> Lai
>
>
> On Tue, May 28, 2019 at 9:32 AM Haibin Lin 
> wrote:
>
> > Hi dev@,
> >
> > I was testing GluonNLP with MXNet master, and found that BERT training
> > crashes a few hours after I launch the job. I can confirm that MXNet pip
> > package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP to
> > check what causes the problem. I'll send an update as soon as I find the
> > root cause, or if I find any workaround.
> >
> > Thanks,
> > Haibin
> >
> > On Thu, May 23, 2019 at 2:12 AM Lin Yuan  wrote:
> >
> > > Hi Lai,
> > >
> > > One important PR that is currently blocked by a Flaky TensorRT test:
> > >
> > > https://github.com/apache/incubator-mxnet/pull/15041
> > >
> > > I have retriggered it several times. If it fails again, I may need CI
> > team
> > > to help disable this test. It has been reported by multiple people:
> > > https://github.com/apache/incubator-mxnet/issues/14978
> > >
> > > Thanks,
> > >
> > > Lin
> > >
> > > On Wed, May 22, 2019 at 11:38 PM Zhao, Patric 
> > > wrote:
> > >
> > > > Thanks, Lai.
> > > >
> > > > With the great helps from the community, all PRs listed in the
> roadmap
> > > are
> > > > done :)
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
> > > >
> > > > Update the status of the below list
> > > >
> > > >  - [1] PR#14713 is almost done and wait for internal validation
> results
> > > >  - [2] PR#14893 is merged
> > > >  - [3] PR#15031 is merged
> > > >  - [7] PR#15038 new PR to fix the bug in C++ interface, will be
> merged
> > > > soon after the review.
> > > >
> > > > Feel free to let me know if anything our team can help :)
> > > >
> > > > BR,
> > > >
> > > > --Patric
> > > >
> > > > > -Original Message-
> > > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > > Sent: Thursday, May 23, 2019 6:05 AM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > > >
> > > > > Hi @dev,
> > > > >
> > > > > Thanks for working hard for the 1.5 release, since there has been
> > > several
> > > > > release blockers (mostly fixed). We are extending the code freeze
> to
> > > > Friday
> > > > > 05/22/2019. Right now we are tracking the following 5 open
> > > > PRs[1][2][3][4][5]
> > > > > and 1 issue[6]. Please let us know if you need more time.
> > > > >
> > > > > I would like to encourage all downstream projects to test with
> latest
> > > > MXNet
> > > > > to avoid any incompatibility in the coming 1.5.0 release. If you
> have
> > > any
> > > > > issues that may block the release, please let us know.
> > > > > Thank you very much.
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/pull/14713
> > > > > [2] https://github.com/apache/incubator-mxnet/pull/14893
> > > > > [3] https://github.com/apache/incubator-mxnet/pull/15031
> > > > > [4] https://github.com/apache/incubator-mxnet/pull/15039
> > > > > [5] https://github.com/apache/incubator-mxnet/pull/15041
> > > > > [6] https://github.com/apa

Re: [DISCUSS] 1.5.0 Release Plan

2019-05-28 Thread Haibin Lin
Hi dev@,

I was testing GluonNLP with MXNet master, and found that BERT training
crashes a few hours after I launch the job. I can confirm that MXNet pip
package 20190412 works fine. I am bisecting changes in MXNet/GluonNLP to
check what causes the problem. I'll send an update as soon as I find the
root cause, or if I find any workaround.

Thanks,
Haibin

On Thu, May 23, 2019 at 2:12 AM Lin Yuan  wrote:

> Hi Lai,
>
> One important PR that is currently blocked by a Flaky TensorRT test:
>
> https://github.com/apache/incubator-mxnet/pull/15041
>
> I have retriggered it several times. If it fails again, I may need CI team
> to help disable this test. It has been reported by multiple people:
> https://github.com/apache/incubator-mxnet/issues/14978
>
> Thanks,
>
> Lin
>
> On Wed, May 22, 2019 at 11:38 PM Zhao, Patric 
> wrote:
>
> > Thanks, Lai.
> >
> > With the great helps from the community, all PRs listed in the roadmap
> are
> > done :)
> >
> >
> https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642
> >
> > Update the status of the below list
> >
> >  - [1] PR#14713 is almost done and wait for internal validation results
> >  - [2] PR#14893 is merged
> >  - [3] PR#15031 is merged
> >  - [7] PR#15038 new PR to fix the bug in C++ interface, will be merged
> > soon after the review.
> >
> > Feel free to let me know if anything our team can help :)
> >
> > BR,
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > Sent: Thursday, May 23, 2019 6:05 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > >
> > > Hi @dev,
> > >
> > > Thanks for working hard for the 1.5 release, since there has been
> several
> > > release blockers (mostly fixed). We are extending the code freeze to
> > Friday
> > > 05/22/2019. Right now we are tracking the following 5 open
> > PRs[1][2][3][4][5]
> > > and 1 issue[6]. Please let us know if you need more time.
> > >
> > > I would like to encourage all downstream projects to test with latest
> > MXNet
> > > to avoid any incompatibility in the coming 1.5.0 release. If you have
> any
> > > issues that may block the release, please let us know.
> > > Thank you very much.
> > >
> > > [1] https://github.com/apache/incubator-mxnet/pull/14713
> > > [2] https://github.com/apache/incubator-mxnet/pull/14893
> > > [3] https://github.com/apache/incubator-mxnet/pull/15031
> > > [4] https://github.com/apache/incubator-mxnet/pull/15039
> > > [5] https://github.com/apache/incubator-mxnet/pull/15041
> > > [6] https://github.com/apache/incubator-mxnet/issues/15034
> > >
> > >
> > > Best Regards
> > >
> > > Lai
> > >
> > >
> > > On Wed, May 15, 2019 at 9:05 PM Junru Shao 
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Here I may have a release blocker for 1.5.0 about implementation of
> > > > dynamic shape mechanism, which somehow conflicts with Gluon's
> > > deferred
> > > > initialization [1].
> > > >
> > > > [1] https://github.com/dmlc/gluon-nlp/issues/706
> > > >
> > > > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> > > > anirudh2...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Lai,
> > > > >
> > > > > From the discussion I had with Nvidia offline they are targeting on
> > > > pushing
> > > > > the required changes today.
> > > > > Since this is important feature for the release, if this gets
> > > > > delayed and cannot  be merged by 05/17/2019, the code freeze date
> > > > > may need to be changed.
> > > > >
> > > > > Anirudh
> > > > >
> > > > > On Wed, May 15, 2019 at 1:23 AM Lv, Tao A 
> > wrote:
> > > > >
> > > > > > Hi dev,
> > > > > >
> > > > > > We see there are several github issues [1][2][3][4] about mxnet
> > > > > > windows build experience. The team is working intensively
> > > > > > [5][6][7] on that to
> > > > > fix
> > > > > > some problems of MKL-DNN build on windows. We hope these fixes
> > > can
> > > > catch
> > > > > > the code freeze and finally enter the 1.5.0 release.
> > > > > >
> > > > > > The PR against mshadow (#374) was already merged and MXNet PR
> > > > > > #14877 is under review - great thanks to CI team for helping on
> > > > > > the MKL
> > > > > installation
> > > > > > request. PR #14952 is document change according to build logic
> > > > > > changes
> > > > in
> > > > > > PR #14877. So I think these two PRs should be merged
> > simultaneously.
> > > > > > Currently #14877 is experiencing a CI response problem.
> > > > > >
> > > > > > Please take your time to have a look at these two PRs. Your
> > > > > > comments
> > > > and
> > > > > > suggestions are highly appreciated.
> > > > > >
> > > > > > Thanks,
> > > > > > -tao
> > > > > >
> > > > > > [1] https://github.com/apache/incubator-mxnet/issues/14670
> > > > > > [2] https://github.com/apache/incubator-mxnet/issues/14335
> > > > > > [3] https://github.com/apache/incubator-mxnet/issues/14203
> > > > > > [4] https://github.com/apache/incubator-mxnet/issues/14085
> > > > > > [5] 

Re: direction for documentation across various APIs that share common doc source

2019-03-12 Thread Haibin Lin
Hi Aaron,

You can see that the examples listed in elemwise_addDoc class in
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/ndarray_doc.py#L57
are appended to the example section of elemwise_add op in
http://mxnet.incubator.apache.org/api/python/ndarray/ndarray.html?highlight=reshape#mxnet.ndarray.elemwise_add


You can take a look at the _build_doc function in ndarray_doc.py which
contains the logic to append examples from xxDoc classes. The
_build_doc function is called in _generate_ndarray_function_code when these
python functions are generated:
https://github.com/apache/incubator-mxnet/blob/e3a51b5a3ed989bf1e9c9f53b56819b32957527f/python/mxnet/ndarray/register.py#L54-L60


Best,
Haibin


On Wed, Mar 6, 2019 at 11:57 AM Aaron Markham 
wrote:

> Mu,
> Thanks for your response. I have some follow-up questions now. A lot
> actually.
> Can you explain more about what ndarray_doc.py is doing? I see that
> ndarray.register is calling it to do some transformations to
> docstrings by injecting "float". This seems quite buried to me. Some
> may have wondered tracing through the ndarray docs, "where does float
> come from? It's not listed here in the docstring. Strange."
> Is there a document that describes this pattern so other developers
> know how to use it and what impact it has? Why am I not seeing the
> pattern other than ndarray and symbol and only for float support? Why
> doesn't symbol have a correlating symbol_doc.py? This makes me wonder
> about the various issues I've seen where functions are properly
> described, or at all, when Sphinx runs. Is this something that should
> have been applied more widely, but has not? Wouldn't it make sense to
> have the docs massaging processes centralized for maintenance and
> clarity?
>
> Aside from that, I'm still not seeing the path for solving the issue
> with R and Scala and Java showing psuedocode or python code in their
> examples by using `make doctest`. Maybe they're first steps to make
> sure Python examples execute, but don't extend a solution to any other
> language binding? That's fine if so, but I still want to keep
> exploring what we do to facilitate good docs for the other bindings.
> Are you perhaps suggesting that each language binding follow this
> rewriting of the docstrings pattern that's in ndarray and symbol?
>
> Can you look at this PR and provide feedback on a tangible example of
> how to proceed? https://github.com/apache/incubator-mxnet/pull/14243
>
> 
>
> Vishaal & Anton, thanks for your feedback too. Flagging the code makes
> a lot of sense as then it would be quite apparent what its intended
> language is. Rerunning sphinx to rewrite the output could work, and
> that assumes those packages have something specific and relevant to
> inject. Unfortunately for R, Scala, and Java, that doesn't seem to be
> the case as this point. Please correct me if I'm missing something
> here.
>
> 
>
> Cheers,
> Aaron
>
> On Tue, Mar 5, 2019 at 9:44 AM Mu Li  wrote:
> >
> > The original design is putting psudo-code in cc files  (e.g. ndarray.cc
> > <
> https://github.com/apache/incubator-mxnet/blob/master/src/ndarray/ndarray.cc
> >)
> > that are languange indepent, then having python codes in .py files (e.g.
> > ndarray_doc.py
> > <
> https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/ndarray_doc.py
> >).
> > However, we haven't define the psudo-code format so some codes in cc
> files
> > look like python, and we didn't enable doctest so some py file codes
> cannot
> > be executed.
> >
> > I suggest the following next steps:
> >
> > 1. follow tensorflow's psudo-code format, e.g.
> > https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/fill
> > 2. enable doctest during building the doc (make doctest)
> >
> > On Mon, Mar 4, 2019 at 10:09 AM Vishaal Kapoor 
> > wrote:
> >
> > > Hey Aaron and  Anton,
> > >
> > > One of MXNet's strengths over other frameworks is the plethora of
> language
> > > bindings so having language specific examples is of importance. Perhaps
> > > indicating that an example is Python code by using a "#python" header
> on
> > > the example would make it clear.  Of course, for the important APIs,
> > > docstrings for the most popular languages would be desired.
> Additionally,
> > > making the holes clear would make it easier for users to contribute
> > > documentation for their favorite languages.
> > >
> > > Vishaal
> > >
> > > On Mon, Mar 4, 2019 at 8:34 AM Anton Chernov 
> wrote:
> > >
> > > > Hi Aaron,
> > > >
> > > > Here is an idea: The main documentation is the one in .cc files. In
> > > theory
> > > > the language bindings should just override some stuff from it, like
> > > > examples. If I understand correctly there is a sphinx script that
> > > generates
> > > > the documentation. If run it first for core src folder and then from
> a
> > > > language 

Re: "If" function in MXNET and sharing parameters

2019-03-08 Thread Haibin Lin
Hi Stanislas,

Did you consider nd/symbol.contrib.cond for conditional statement?
https://mxnet.incubator.apache.org/versions/master/tutorials/control_flow/ControlFlowTutorial.html

Best,
Haibin

On Fri, Mar 8, 2019 at 10:29 Lauly, Stanislas 
wrote:

> Hi,
>
> About MXNET module, I need to create a model that use a different input
> matrix depending on the actual input, is there a kind of “if” that can be
> part of the computational graph? If not, I can always create multiple
> models that share some params between them, but doing so does not sound
> trivial. I found this:
>
>
> https://discuss.mxnet.io/t/sharing-parameters-between-two-modules-through-arg-dict/1046
>
> But it looks like it is not very efficient. Is there a better way to share
> parameters between models?
>
>
> Stanislas Lauly
> Applied scientist – Amazon AI
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc3

2019-02-19 Thread Haibin Lin
+1
Built from source on Ubuntu and it passed kvstore unit tests.

Best,
Haibin

On Tue, Feb 19, 2019 at 10:03 AM Piyush Ghai  wrote:

> Hi all,
>
> I still need more votes from PMC members in order to conclude this vote.
>
> PMC members, please TEST and vote accordingly. Your votes will help us
> release MXNet version soon.
> I’m extending the voting thread until Feb 20th 12 AM PST.
>
> Best regards,
> Piyush
>
>
> On 2019/02/16 02:01:25, Piyush Ghai  wrote:
> > Dear MXNet community,>
> >
> > I would like to propose a vote to release Apache MXNet (incubating)
> version v1.4.0.>
> > Voting will start today, Friday February 15th 6pm PST and will close on
> Monday,>
> > February 18th 6pm PST.>
> >
> > Link to release notes:>
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+(incubating)+1.4.0+Release+Notes>>
>
> >
> > Link to release candidate 1.4.0.rc3:>
> >  
> https://github.com/apache/incubator-mxnet/releases/tag/1.4.0.rc3 <
> https://github.com/apache/incubator-mxnet/releases/tag/1.4.0.rc3>/>
> >
> > Link to source and signatures on apache dist server:>
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.0.rc3/ <
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.4.0.rc3/> >
> >
> >
> > Please remember to TEST first before voting accordingly:>
> > +1 = approve>
> > +0 = no opinion>
> > -1 = disapprove (provide reason)>
> >
> >
> > Best regards,>
> > Piyush>


Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-04 Thread Haibin Lin
+1 built from source on Linux and passed dist sync kvstore test.

On Mon, Feb 4, 2019 at 9:54 AM Lin Yuan  wrote:

> +1 build from source on MacOS 10.13.6 and tested mxnet-to-coreml converter.
>
> On Mon, Feb 4, 2019 at 9:03 AM Indhu  wrote:
>
> > +1
> >
> > Build from source and tested few examples from the examples folder.
> >
> > Thanks,
> > Indu
> >
> >
> >
> > On Fri, Feb 1, 2019 at 6:21 PM Steffen Rochel 
> > wrote:
> >
> > > Hi Sheng - thanks for the feedback.
> > > TVM notice  file is missing as the 1.4.x branch/v1.4.0 release is using
> > TVM
> > > commit 0f053c8
> > > <
> > >
> >
> https://github.com/dmlc/tvm/commit/0f053c82a747b4dcdf49570ec87c17e0067b7439
> > > >
> > >  from Oct 8, 2018, which didn't have the NOTICE file. IMHO, MXNet
> NOTICE
> > > file is consistent with release content.
> > > As the release started in 2018 I do think it is ok to move forward w/o
> > > update to 2019 IMHO.
> > >
> > > All -
> > > thanks to the committers/contributors (Tao, Aaron, Kellen, Aston, Yuxi)
> > who
> > > tested and provided feedback - we have five +1 votes.
> > > As of today, Friday Feb 1st 2019 6pm PST we have two binding votes, one
> > +1
> > > (Carin), one +0 (Sheng). The vote continues be open waiting for
> feedback
> > > from PMC members.
> > > Hope you can spare some time over the weekend to provide feedback.
> > >
> > > Regards,
> > > Steffen
> > >
> > > On Fri, Feb 1, 2019 at 12:44 AM Marco de Abreu <
> marco.g.ab...@gmail.com>
> > > wrote:
> > >
> > > > Considering the release process has been started last year and the
> code
> > > tag
> > > > has also been based on last year, I'd say that it is not really a big
> > > deal.
> > > >
> > > > -Marco
> > > >
> > > > Am Fr., 1. Feb. 2019, 09:33 hat Sheng Zha 
> > > > geschrieben:
> > > >
> > > > > I found an awesome checklist for incubator releases [1] so I'm
> using
> > it
> > > > > here:
> > > > >
> > > > > -[Y] Are release files in correct location?
> > > > > -[Y] Do release files have the word incubating in their name?
> > > > > -[Y] Are the digital signature and hashes correct?
> > > > > -[Y] Does DISCLAIMER file exist?
> > > > > -[Y] Do LICENSE and NOTICE files exists?
> > > > > -[N/A] Is the LICENSE and NOTICE text correct? (sz: did not finish
> > > > > checking)
> > > > > -[N] Is the NOTICE year correct?
> > > > > -[N/A] Un-included software dependencies are not mentioned in
> LICENSE
> > > or
> > > > > NOTICE? (sz: did not finish checking)
> > > > > -[Y] License information is not mentioned in NOTICE?
> > > > > Is there any 3rd party code contained inside the release? If so:
> > > > > -[Y] Does the software have a compatible license?
> > > > > -[Y] Are all software licenses mentioned in LICENSE?
> > > > > -[Y] Is the full text of the licenses (or pointers to it) in
> LICENSE?
> > > > > Is any of this code Apache licensed? Do they have NOTICE files? If
> > so:
> > > > > -[N] Have relevant parts of those NOTICE files been added to this
> > > NOTICE
> > > > > file?
> > > > > TVM has Apache 2.0 license and its NOTICE hasn't been added to
> > MXNet's
> > > > > NOTICE file.
> > > > > -[Y] Do all source files have ASF headers? (sz: enforced by license
> > > > > checker)
> > > > > -[Y] Do the contents of the release match with what's tagged in
> > version
> > > > > control?
> > > > > -[N] Are there any unexpected binary files in the release?
> > > > > -[Y] Can you compile from source? Are the instruction clear?
> > > > >
> > > > > Is the issue minor?
> > > > > - Unsure. NOTICE year is wrong (it's 2019 now). TVM's NOTICE is
> > missing
> > > > > from MXNet's NOTICE file.
> > > > > Could it possibly be fixed in the next release?
> > > > > - Yes
> > > > > I vote with:
> > > > > +0 not sure if it should be released. Could mentors advise if we
> > should
> > > > fix
> > > > > them before release?
> > > > >
> > > > > [1] https://wiki.apache.org/incubator/IncubatorReleaseChecklist
> > > > >
> > > > >
> > > > > On Thu, Jan 31, 2019 at 10:56 PM Lv, Tao A 
> > wrote:
> > > > >
> > > > > >
> > > > > > +1. Verified below items:
> > > > > >
> > > > > > 1. Checkout code from tag 1.4.0rc2 and build mkldnn backend
> > > > successfully
> > > > > > on both cpu and gpu w/ mkl and openblas
> > > > > > 2. ResNet50v1 FP32 performance looks good for both latency and
> > > > throughput
> > > > > > 3. Quantization script works well with ResNet50v1
> > > > > > 4. ResNet50v1 INT8 model accuracy looks good
> > > > > > 5. ResNet50v1 INT8 model performance speedup looks good for both
> > > > latency
> > > > > > and throughput
> > > > > >
> > > > > >
> > > > > > -Original Message-
> > > > > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > > > > Sent: Friday, February 1, 2019 11:45 AM
> > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > 1.4.0.rc2
> > > > > >
> > > > > > Great, thanks Steffen!  I added a few key files but missed that
> > one.
> > > > > >
> > > > > > +1 from me.

Re: Taxonomy on our cwiki

2019-01-18 Thread Haibin Lin
+1

Will there be broken links? I thought confluence will show "page is now
moved to https://xxx.html; to redirect users, when this kind of reorg
happens.

Best,
Haibin

On Fri, Jan 18, 2019 at 4:50 PM Aaron Markham 
wrote:

> +1 but note that this is probably going to create a bunch of broken links
> on the MXNet website and maybe elsewhere. Should make time to deal with
> that in this process.
>
> On Fri, Jan 18, 2019, 12:43 Carin Meier 
> > +1 Great idea
> >
> > On Fri, Jan 18, 2019 at 2:38 PM Sheng Zha  wrote:
> >
> > > Hi MXNet,
> > >
> > > Given that currently cwiki is the only place other than mxnet website
> for
> > > mxnet-related documentation, I'd like to request your attention to the
> > > (slightly disorganized) cwiki page of MXNet. The top level folders (and
> > > their contents) currently looks like this:
> > > - Design Proposals* (bag of proposals, not in order)
> > > - Development* (mixture of guides, roadmaps, processes)
> > > - Release Process (release notes)
> > > - Website (guides and proposals)
> > > - MXNet Clojure (call for contribution, guides)
> > > - MXNet Keras Integration (design)
> > > - MXNet-ONNX Integration (design, dev status)
> > > - MXNet R Package (guide, backlog)
> > > - MXNet-Scala (design, dev status, guide)
> > > - Content Formatting Templates (not a folder but link to two docs)
> > > - How-to articles (1 guide)
> > > - Community (guide on apache-related processes)
> > > - Data IO (designs)
> > > - Continuous Integration (guides, designs)
> > > - Meetups and Hangouts (events)
> > >
> > > And here are two good examples from successful Apache projects:
> > > - Apache Flink: an **audience-oriented** structure [1]
> > >   Users (Presentations and How-to)
> > >   Contributors (Dev processes and How-to)
> > >   Committers (Infra, Dev processes, Release processes, Releases)
> > >   Roadmaps and Feature Designs (archive)
> > > - Apache OpenNLP: a **content-oriented** structure [2]
> > >   Guides
> > >   External Resources
> > >   Proposals
> > >   Releasing
> > >
> > > Clean organization helps content discovery and saves time on locating
> > > useful content. Given that we have good amount of content on the wiki
> > page,
> > > I suggest that we decide on a cleaner taxonomy, re-organize contents
> > > accordingly, and add future contents accordingly. To provide a starting
> > > point for the discussion, I suggest:
> > > - Given the state we are in, start with content-oriented organization,
> > use
> > > these top-level categories: Guides (including processes and how-tos),
> > > Development (including designs, proposals, notes, roadmaps), Community
> > > (including events, activities, external resources and contents)
> > > - If people strongly prefer audience-oriented structure, later we can
> > adopt
> > > a structure similar to Flink's.
> > >
> > > Feel free to share your thoughts and preferences here. Thanks.
> > >
> > > -sz
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Homehttps://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> > > [2] https://cwiki.apache.org/confluence/display/OPENNLP/Index
> > >
> >
>


Re: [ANNOUNCE] MKLDNN becomes the default CPU backend in Apache/MXNet master branch

2019-01-12 Thread Haibin Lin
Awesome work!

On Sat, Jan 12, 2019 at 1:25 AM Lv, Tao A  wrote:

> Thanks for the great collaboration through the community to make things
> happen. :)
>
>
> -Original Message-
> From: Jun Wu [mailto:wujun@gmail.com]
> Sent: Saturday, January 12, 2019 12:54 PM
> To: dev@mxnet.incubator.apache.org
> Cc: u...@mxnet.apache.org
> Subject: Re: [ANNOUNCE] MKLDNN becomes the default CPU backend in
> Apache/MXNet master branch
>
> Great work on boosting the MXNet/MKLDNN performance significantly!
>
> On Fri, Jan 11, 2019 at 7:08 PM Li, Mu  wrote:
>
> > Awesome job! That’s a great benefit to CPU users
> >
> > Best
> > Mu
> >
> > > On Jan 11, 2019, at 6:59 PM, Zhao, Patric 
> wrote:
> > >
> > > Dear all,
> > >
> > > I am pleased to announce that the MKLDNN is the default CPU backend in
> > the master branch for the Linux platform now.
> > > (note: the nightly build and release doesn't change)
> > >
> > > Really thanks to the great supports and joint works from the community.
> > >
> > > Feedbacks are highly appreciated :)
> > >
> > > Related links:
> > >
> > > 1.   Integration design:
> >
> https://cwiki.apache.org/confluence/display/MXNET/The+design+of+MKLDNN+integration
> > >
> > > 2.   Performance and accuracy:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> > >
> > > 3.   MKLDNN README:
> > https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md
> > >
> > > Thanks,
> > >
> > > --Patric
> > >
> >
>


Re: Incubator Podling Report (Due 2nd January)

2019-01-02 Thread Haibin Lin
Dear MXNet mentors,

Please review and approve quarterly podling report at
https://wiki.apache.org/incubator/January2019. The report was created by
myself (PPMC member) with contributions from Steffen. Sergey Kolychev (PPMC
member) reviewed and approved. Thanks.

Best,
Haibin

On Wed, Jan 2, 2019 at 2:00 PM Haibin Lin  wrote:

> Dear MXNet community,
>
> The section for "How has the community developed since the last report?"
> has been updated with recent developments in MXNet ecosystem. Please feel
> free to provide feedbacks on the draft report. Thanks.
>
> Best,
> Haibin
>
> On Wed, Jan 2, 2019 at 1:23 PM Michael Wall  wrote:
>
>> Thanks Steffen and Haibin.
>>
>> On Wed, Jan 2, 2019 at 3:09 PM Steffen Rochel 
>> wrote:
>>
>> > Dear MXNet community -
>> > Haibin and I are working on the MXNet report. See a draft at
>> > https://wiki.apache.org/incubator/January2019.
>> > Please provide feedback by 5pm PST today, so the report can be
>> submitted in
>> > time by EOB today.
>> >
>> > Thanks,
>> > Steffen
>> >
>> > On Tue, Jan 1, 2019 at 10:39 PM Justin Mclean 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > The report is due today. [1] If you cannot report this month, it will
>> eb
>> > > noted in teh board report and you'll be asked to report next month.
>> > >
>> > > Thanks,
>> > > Justin
>> > >
>> > > 1. https://wiki.apache.org/incubator/January2019
>> > >
>> >
>>
>


Re: Incubator Podling Report (Due 2nd January)

2019-01-02 Thread Haibin Lin
Dear MXNet community,

The section for "How has the community developed since the last report?"
has been updated with recent developments in MXNet ecosystem. Please feel
free to provide feedbacks on the draft report. Thanks.

Best,
Haibin

On Wed, Jan 2, 2019 at 1:23 PM Michael Wall  wrote:

> Thanks Steffen and Haibin.
>
> On Wed, Jan 2, 2019 at 3:09 PM Steffen Rochel 
> wrote:
>
> > Dear MXNet community -
> > Haibin and I are working on the MXNet report. See a draft at
> > https://wiki.apache.org/incubator/January2019.
> > Please provide feedback by 5pm PST today, so the report can be submitted
> in
> > time by EOB today.
> >
> > Thanks,
> > Steffen
> >
> > On Tue, Jan 1, 2019 at 10:39 PM Justin Mclean 
> wrote:
> >
> > > Hi,
> > >
> > > The report is due today. [1] If you cannot report this month, it will
> eb
> > > noted in teh board report and you'll be asked to report next month.
> > >
> > > Thanks,
> > > Justin
> > >
> > > 1. https://wiki.apache.org/incubator/January2019
> > >
> >
>


Re: Apache MXNet v1.4.0 release status

2018-12-19 Thread Haibin Lin
Hi Steffen,

Aston and I would like to bring this PR to your attention:
https://github.com/apache/incubator-mxnet/pull/13686, where Zhi fixed the
num_worker argument of DataLoader on windows. Without this fix, using
DataLoader with num_worker > 0 would result in crash on Windows. Bringing
this PR to 1.4.x would greatly benefit windows users of MXNet. Aston is
working on the dive into deep learning book
 based on MXNet, which is due and frozen
for publication next week. Currently the book will depend on MXNet 1.4.0
and discourages readers from using multi-worker DataLoaders due to this bug
on Windows. With this fix Aston can update the examples in the book with
DataLoader using multiple workers, which will be very beneficial to the
broader MXNet community.

Best,
Haibin

On Mon, Dec 17, 2018 at 6:11 AM Pedro Larroy 
wrote:

> Hi Steffen
>
> Added some notes in your PR for the release notes.
>
> In particular, I'm a bit concerned about the status of topology aware
> communication, since it has open issues and is not being tested in CI.
> (The tests also fail). I think we should anounce it when it's working
> properly and it's well tested.
>
> Pedro.
>
> On Sat, Dec 15, 2018 at 11:06 AM Steffen Rochel 
> wrote:
> >
> > Dear MXNet community -
> > all issues beside one
> >  have been
> > addressed. I suggest to document the last remaining issue as known
> problem
> > and move forward with the release.
> > Please communicate if you have concerns of know about critical issues to
> be
> > addressed before starting vote about releasing 1.4.0 as soon as possible.
> > Please also have a look at the release notes
> > <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> >
> > and provide feedback.
> >
> > I'm planing to start voting beginning of next week.
> > Steffen
> >
> > On Sat, Dec 8, 2018 at 8:31 PM Steffen Rochel 
> > wrote:
> >
> > > Hi Pedro - this are indeed the draft release notes for v1.4.0. Please
> add
> > > description as you suggested.
> > >
> > > All - please have a look at the release notes and provide feedback and
> > > suggestions..
> > > Steffen
> > > On Sun, Dec 9, 2018 at 3:30 AM Zhao, Patric 
> wrote:
> > >
> > >> Hi Steffen,
> > >>
> > >> I saw the draft of 1.4 release notes in here (
> > >>
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes
> > >> ).
> > >>
> > >> Is this near the final version?  I'd like to add some descriptions of
> new
> > >> quantization features enabled in 1.4.
> > >>
> > >> Is it OK?
> > >>
> > >> Thanks,
> > >>
> > >> --Patric
> > >>
> > >>
> > >> > -Original Message-
> > >> > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > >> > Sent: Saturday, December 8, 2018 1:12 AM
> > >> > To: dev@mxnet.incubator.apache.org
> > >> > Subject: Apache MXNet v1.4.0 release status
> > >> >
> > >> > Dear MXNet community -
> > >> > I would like to provide update on v1.4.0 status, details are tracked
> > >> here
> > >> > <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28i
> > >> > ncubating%29+1.4.0+Release+Plan+and+Status>
> > >> > .
> > >> >
> > >> > Thank you very much for everybody effort to resolve the identified
> > >> issues.
> > >> > We are down to 3 open issues - for details please see
> > >> >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
> > >> > cubating%29+1.
> > >> > 4.0
> > >> >
> +Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandSt
> > >> > atu
> > >> > +Release+Plan+and+s-OpenPRstotrack
> > >> > <
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28i
> > >> > ncubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.
> > >> > 4.0ReleasePlanandStatus-OpenPRstotrack>
> > >> > Please help to resolve the remaining issues and integrate to v1.4.x
> > >> branch.
> > >> > Current estimate to address the identified security vulnerabilities
> in
> > >> the
> > >> > Scala/Java package and merge into v1.4.x branch is end of next week
> > >> > (December 14th) I will communicate as soon I have more information.
> > >> >
> > >> > Regards,
> > >> > Steffen
> > >>
> > >
>


Re: trouble with foreach operator in conjunction with multiple GPUs

2018-12-08 Thread Haibin Lin
Hi Tali,

Yes I think currently the foreach API is experimental and multi-device
support is future work. The existing implementation uses the main thread to
wait for execution result and does not handle the case for data parallel
training on multi gpus. However, if you use gluon, you can probably walk
around this problem using a python multi-thread module, where each thread
performs forward operation on a GPU. I submitted a PR for such utility in
gluon-nlp: https://github.com/dmlc/gluon-nlp/pull/387/files

@Da, feel free to chime in.

Best,
Haibin

On Wed, Nov 28, 2018 at 9:18 AM Taliesin Beynon
 wrote:

> Hello fellow MXNetters
>
> We've seen that the subgraph execution mechanism that is used to run
> things like the foreach operator causes MXExecutorForward to block, instead
> of just issuing the ops in the normal asynchronous way (
> https://github.com/apache/incubator-mxnet/blob/212364b0cba28aeda989378f6e630f7a61749bf3/src/executor/graph_executor.cc#L1352).
> On its own this is a surprising fact that can lead to some issues if you're
> not expecting it, like your time being spent in MXExecutorForward instead
> of WaitAll / WaitRead . Is there a reason that this process isn't just
> automatically done on a separate thread for you? Is it to ensure that
> subsequent ops on the original thread are correctly serialized wrt the ops
> produced by the foreach?
>
> More importantly, this has the unfortunate implication that if you are
> using multi-device parallelism with foreach, by just looping over your
> executors and calling Forward on them, you will inadvertently serialize
> much of the computation: you can't call Forward on the second executor
> until Forward on the first executor has returned, and the foreach causes
> that first Forward call to block until the forward pass is (mostly) done!
>
> So it kills multi-device parallelism unless one starts making thread pools
> so that the one can 'unblock' Forward (and probably the subsequent
> Backward) and have each device's Forward being run in a separate thread.
>
> Is this intended? Are we missing something about how you are supposed to
> use subgraphs in conjunction with multi-device parallelism? It seems like a
> weakness in the current design of subgraph execution. It also appears that
> the python API doesn't have any strategy to deal with this issue, as you
> can see on
> https://github.com/apache/incubator-mxnet/blob/2276bb0e30b1fe601eb288cb4f1b673484892d4b/python/mxnet/executor_manager.py#L281,
> it's not making separate threads or anything there.
>
> Thanks!
> Tali + Sebastian


Re: [Announcement] New Committer -- Rahul Huilgol

2018-12-03 Thread Haibin Lin
Congratulations Rahul. Well deserved!

On Mon, Dec 3, 2018 at 10:18 PM Steffen Rochel 
wrote:

> Congratulation Rahul!
>
> On Mon, Dec 3, 2018 at 10:17 PM Hagay Lupesko  wrote:
>
> > +1 - congrats Rahul!
> >
> > On Mon, Dec 3, 2018 at 8:09 PM kellen sunderland <
> > kellen.sunderl...@gmail.com> wrote:
> >
> > > Congrats Rahul, well deserved.
> > >
> > > On Mon, Dec 3, 2018 at 6:24 PM Tianqi Chen  wrote:
> > >
> > > > Let us welcome Rahul Huilgol as a new Committer of MXNet. He has
> > > > contributed to many fronts, including the FP16 support, distributed
> > > > training and mixed precision support of MXNet. He has a breadth of
> > > > knowledge across multiple modules of the system and would be valuable
> > > > member of the committer team
> > > >
> > > > PRs
> https://github.com/apache/incubator-mxnet/commits?author=rahul003
> > > > Reviews
> > > >
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Arahul003
> > > > dev@
> > > >
> > https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:rahul003
> > > >
> > > >
> > > > Tianqi
> > > >
> > >
> >
>


Re: v1.4.0 status 11/29

2018-12-03 Thread Haibin Lin
It would also be great to include the PR that reverts a commit causing cpu
performance degradation https://github.com/apache/incubator-mxnet/pull/13501,
where num_omp_threads decrease to 1 when multiple GPUs are used, as Anirudh
reported in
https://github.com/apache/incubator-mxnet/issues/13449#issuecomment-443388522


Best,
Haibin

On Mon, Dec 3, 2018 at 10:50 AM Afrooze, Sina  wrote:

> I would also like this PR which is already merged with master (
> https://github.com/apache/incubator-mxnet/pull/13426) to be included in
> 1.4.0 to avoid any potential ONNX export issues in cases where the API is
> not used strictly correctly. - Sina
>
>
>
> On 11/30/18, 2:17 PM, "Alex Zai"  wrote:
>
> PR is here https://github.com/apache/incubator-mxnet/pull/13497.
>
> On Thu, Nov 29, 2018 at 8:56 PM Lv, Tao A  wrote:
>
> > Credit belongs to Alex.
> >
> > Hi Alex, would you mind porting your fix to the v1.4.x branch?
> >
> > Thanks,
> > -Tao
> >
> > -Original Message-
> > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > Sent: Friday, November 30, 2018 12:48 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: v1.4.0 status 11/29
> >
> > Hi Tao - thanks for fixing the crash. Please create PR on v1.4.x
> branch
> > with [v1.4.x] in title and add me to the PR.
> > Steffen
> >
> > On Thu, Nov 29, 2018 at 8:44 PM Lv, Tao A 
> wrote:
> >
> > > Hi Steffen, I would like to have
> > > https://github.com/apache/incubator-mxnet/pull/13433  into the
> coming
> > > 1.4.0 release. It fixed a crash of deconvolution with certain input
> > > size for MKL-DNN backend. This PR is well reviewed and already
> merged
> > > into the master branch. New test case is also included there.
> > >
> > > Please find the corresponding issue here:
> > > https://github.com/apache/incubator-mxnet/issues/13421 .
> > >
> > > Thanks,
> > > -Tao
> > >
> > > -Original Message-
> > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > Sent: Friday, November 30, 2018 12:05 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: v1.4.0 status 11/29
> > >
> > > Dear MXNet community -
> > > I would like to provide update on v1.4.0 status, details will be
> > > tracked here <
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incu
> > > bating%29+1.4.0+Release+Plan+and+Status
> > > >
> > > .
> > >
> > > 1. Sergey created v1.4.x branch
> > > 2. As expected, additional requests have been made for inclusion in
> > > v1.4.0 release. Critical PR are tracked here <
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incu
> > >
> bating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0Re
> > > leasePlanandStatus-OpenPRstotrack
> > > >
> > > .
> > > 3. PR to update README.md is blocked by flaky test failures,
> > > retriggered check.
> > > 4. PR to upgrade version on master to v1.5.0 has been submitted.
> > > 5. CI is setup and first run passed.
> > >
> > > Note: if you want to add selected fixes or enhancements, please
> reply
> > > to this email. Please provide justification, add me as approver to
> the
> > > v1.4.x PR and make sure your changes have tests included in PR and
> get
> > > properly reviewed.
> > >
> > > Regards,
> > > Steffen
> > >
> >
>
>
>
>


Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-06 Thread Haibin Lin
Hi Naveen and Anton,

Thanks for pointing that out. You are right that these are not critical
fixes. Putting them in 1.4.0 is more appropriate. PRs are closed.

Best,
Haibin

On Tue, Nov 6, 2018 at 7:35 AM Naveen Swamy  wrote:

> Please note that this is a patch release(1.3.1) to address critical bugs!,
> For everything else please wait for 1.4.0 which is planned very shortly
> after 1.3.1
>
> > On Nov 6, 2018, at 7:17 AM, Anton Chernov  wrote:
> >
> > The following PR's have been created so far:
> >
> > Infer dtype in SymbolBlock import from input symbol (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13117
> >
> > [MXNET-953] Fix oob memory read (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13118
> >
> > [MXNET-969] Fix buffer overflow in RNNOp (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13119
> >
> > [MXNET-922] Fix memleak in profiler (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13120
> >
> > Set correct update on kvstore flag in dist_device_sync mode (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13121
> >
> > update mshadow (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13122
> >
> > CudnnFind() usage improvements (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13123
> >
> > Fix lazy record io when used with dataloader and multi_worker > 0
> (v1.3.x)
> > https://github.com/apache/incubator-mxnet/pull/13124
> >
> >
> > As stated previously I would be rather opposed to have following PR's it
> in
> > the patch release:
> >
> > Gluon LSTM Projection and Clipping Support (#13055) v1.3.x
> > https://github.com/apache/incubator-mxnet/pull/13129
> >
> > sample_like operators (#13034) v1.3.x
> > https://github.com/apache/incubator-mxnet/pull/13130
> >
> >
> > Best
> > Anton
> >
> > вт, 6 нояб. 2018 г. в 16:06, Anton Chernov :
> >
> >> Hi Haibin,
> >>
> >> I have a few comments regarding the proposed performance improvement
> >> changes.
> >>
> >> CUDNN support for LSTM with projection & clipping
> >> https://github.com/apache/incubator-mxnet/pull/13056
> >>
> >> There is no doubt that this change brings value, but I don't see it as a
> >> critical bug fix. I would rather leave it for the next major release.
> >>
> >> sample_like operators
> >> https://github.com/apache/incubator-mxnet/pull/13034
> >>
> >> Even if it's related to performance, this is an addition of
> functionality
> >> and I would also push this to be in the next major release only.
> >>
> >>
> >> Best
> >> Anton
> >>
> >>
> >> вт, 6 нояб. 2018 г. в 15:55, Anton Chernov :
> >>
> >>> Hi Patric,
> >>>
> >>> This change was listed in the 'PR candidates suggested for
> consideration
> >>> for v1.3.1 patch release' section [1].
> >>>
> >>> You are right, I also think that this is not a critical hotfix change
> >>> that should be included into the 1.3.1 patch release.
> >>>
> >>> Thus I'm not making any further efforts to bring it in.
> >>>
> >>> Best
> >>> Anton
> >>>
> >>> [1]
> >>>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release#PR_candidates
> >>>
> >>>
> >>> вт, 6 нояб. 2018 г. в 1:14, Zhao, Patric :
> >>>
>  Hi Anton,
> 
>  Thanks for looking into the MKL-DNN PR.
> 
>  As my understanding of cwiki (
> 
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release
>  ),
>  these features will go into 1.4 rather than patch release of 1.3.1.
> 
>  Feel free to correct me :)
> 
>  Thanks,
> 
>  --Patric
> 
> > -Original Message-
> > From: Anton Chernov [mailto:mecher...@gmail.com]
> > Sent: Tuesday, November 6, 2018 3:11 AM
> > To: d...@mxnet.apache.org
> > Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1
> patch
> > release
> >
> > It seems that there is a problem porting following changes to the
>  v1.3.x
> > release branch:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> > The bases are different.
> >
> > I would need help from authors of these changes to make a backport
> PR.
> >
> > @ZhennanQin, @xinyu-intel would you be able to assist me and create
> the
> > corresponding PR's?
> >
> > Without proper history and domain knowledge I would not be able to
>  create
> > them by my own in reasonable amount of time, I'm afraid.
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
> >
> >>
> >> As part of:
> >>
> >> Implement mkldnn convolution fusion and quantization
> >> https://github.com/apache/incubator-mxnet/pull/12530
> >>
> >> I propose to add the examples and documentation PR as well:
> >>
> >> MKL-DNN 

Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-05 Thread Haibin Lin
Hi Anton,

Thanks for driving the patch release. Besides the MKL improvements, I
suggest we include two changes for *performance improvement* for NLP tasks
below:

CUDNN support for LSTM with projection & clipping:
- https://github.com/apache/incubator-mxnet/pull/13056
- It is used in state of the art language models such as BIG-LSTM [1] and
Elmo (ACL 2018 best paper) [2]

sample_like operators:
- https://github.com/apache/incubator-mxnet/pull/13034
- Many models require candidate sampling (e.g. word2vec [3], fasttext [4])
for training. The sample_like operator enables drawing random samples
without shape information, therefore the candidate sampling blocks can now
be hybridized and be accelerated a lot.

If there is no concern I will open two PRs for the above two changes to
1.3.x branch. Thanks!

Best,
Haibin

[1] https://arxiv.org/pdf/1602.02410.pdf
[2] https://arxiv.org/pdf/1802.05365.pdf
[3]
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

[4] https://arxiv.org/pdf/1607.01759.pdf

On Mon, Nov 5, 2018 at 11:11 AM Anton Chernov  wrote:

> It seems that there is a problem porting following changes to the v1.3.x
> release branch:
>
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
>
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
>
> The bases are different.
>
> I would need help from authors of these changes to make a backport PR.
>
> @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> corresponding PR's?
>
> Without proper history and domain knowledge I would not be able to create
> them by my own in reasonable amount of time, I'm afraid.
>
> Best regards,
> Anton
>
> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
>
> >
> > As part of:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > I propose to add the examples and documentation PR as well:
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :
> >
> >> Dear MXNet community,
> >>
> >> I will be the release manager for the upcoming 1.3.1 patch release.
> >> Naveen will be co-managing the release and providing help from the
> >> committers side.
> >>
> >> The following dates have been set:
> >>
> >> Code Freeze: 31st October 2018
> >> Release published: 13th November 2018
> >>
> >> Release notes have been drafted here [1].
> >>
> >>
> >> * Known issues
> >>
> >> Update MKL-DNN dependency
> >> https://github.com/apache/incubator-mxnet/pull/12953
> >>
> >> This PR hasn't been merged even to master yet. Requires additional
> >> discussion and merge.
> >>
> >> distributed kvstore bug in MXNet
> >> https://github.com/apache/incubator-mxnet/issues/12713
> >>
> >> > When distributed kvstore is used, by default gluon.Trainer doesn't
> work
> >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
> more
> >> specific, the trainer updates once per GPU, the LRScheduler object is
> >> shared across GPUs and get a wrong update count.
> >>
> >> This needs to be fixed. [6]
> >>
> >>
> >> * Changes
> >>
> >> The following changes will be ported to the release branch, per [2]:
> >>
> >> Infer dtype in SymbolBlock import from input symbol [3]
> >> https://github.com/apache/incubator-mxnet/pull/12412
> >>
> >> [MXNET-953] Fix oob memory read
> >> https://github.com/apache/incubator-mxnet/pull/12631
> >>
> >> [MXNET-969] Fix buffer overflow in RNNOp
> >> https://github.com/apache/incubator-mxnet/pull/12603
> >>
> >> [MXNET-922] Fix memleak in profiler
> >> https://github.com/apache/incubator-mxnet/pull/12499
> >>
> >> Implement mkldnn convolution fusion and quantization (MXNet Graph
> >> Optimization and Quantization based on subgraph and MKL-DNN proposal
> [4])
> >> https://github.com/apache/incubator-mxnet/pull/12530
> >>
> >> Following items (test cases) should be already part of 1.3.0:
> >>
> >> [MXNET-486] Create CPP test for concat MKLDNN operator
> >> https://github.com/apache/incubator-mxnet/pull/11371
> >>
> >> [MXNET-489] MKLDNN Pool test
> >> https://github.com/apache/incubator-mxnet/pull/11608
> >>
> >> [MXNET-484] MKLDNN C++ test for LRN operator
> >> https://github.com/apache/incubator-mxnet/pull/11831
> >>
> >> [MXNET-546] Add unit test for MKLDNNSum
> >> https://github.com/apache/incubator-mxnet/pull/11272
> >>
> >> [MXNET-498] Test MKLDNN backward operators
> >> https://github.com/apache/incubator-mxnet/pull/11232
> >>
> >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >> https://github.com/apache/incubator-mxnet/pull/10921
> >>
> >> Set correct update on kvstore flag in dist_device_sync mode (as part of
> >> fixing [5])
> >> https://github.com/apache/incubator-mxnet/pull/12786
> >>
> >> upgrade mshadow version
> >> 

Re: [VOTE] Separating PMC and Committership

2018-11-05 Thread Haibin Lin
Hi Carin,

Thank you very much for driving this forward!

+1

Best,
Haibin

On Mon, Nov 5, 2018 at 9:54 AM Sebastian  wrote:

> +1 (binding)
>
> On 05.11.18 11:29, Carin Meier wrote:
> > This is a procedural vote on whether to separate the committer and PPMC
> > levels in the project. The current state is that a user is considered as
> > both a committer and a PPMC member at the same time. This vote is to
> change
> > that to be able to invite a person in as a committer separately from a
> PPMC
> > member.
> >
> > Document reference:
> >
> https://cwiki.apache.org/confluence/display/MXNET/Become+an+Apache+MXNet+%28incubating%29+Committer+and+PPMC+Member
> >
> > Discussion thread:
> >
> https://lists.apache.org/thread.html/9c6ecda02e081aa6b689c92badc9dcf05ced6fb3691fd370471773d1@%3Cdev.mxnet.apache.org%3E
> >
> > The vote will be a procedural issue vote as defined
> > https://www.apache.org/foundation/voting.html
> >
> > Votes on procedural issues follow the common format of majority rule
> unless
> > otherwise stated. That is, if there are more favourable votes than
> > unfavourable ones, the issue is considered to have passed -- regardless
> of
> > the number of votes in each category. (If the number of votes seems too
> > small to be representative of a community consensus, the issue is
> typically
> > not pursued. However, see the description of lazy consensus
> >  for a
> > modifying factor.)
> >
> > The vote will run until Friday Nov 9th at 6:00 am EST
> >
> > Thanks,
> > Carin
> >
>


[Discussion] Separating PMC and Committership

2018-10-09 Thread Haibin Lin
Dear MXNet community,

In the past when we invite a person to become a committer, he/she is
automatically made a PMC member. However, a lot of communities keep a small
PMC, and a bigger and more diverse committers to enrich the community. This
has the benefit of having two opportunities to encourage contribution. This
can also help lower the bar for inviting committers, which helps build
consensus in our already large PMC. I'd like to propose the following:

For active contributors we first invite them to become our committers.
Later on as they make significant contribution, we can invite them to PMC.


===
Comments from Marco:

That's a great idea!

The hard question is how to differentiate between a committer and a PMC
member and where we set the bar for each. If I understand you right, you
are proposing to honor active contributions by volume (or another similar
metric). While I think that's a good idea in general, I have a few thoughts:

We definitely have a lot of active people in the project, but let's say
that they contribute a substantial amount, but their contributions can't go
in as-is because they lack quality, consistency, testing or they don't
match with the overall style and best practices. For a code-committer, this
would still be a no-go in my opinion. That person would still require some
guidance and mentoring until they are aligned with the project style and
guidelines as otherwise they might accept low-quality PRs. I know we can
revert that, but let's avoid confrontation as much as possible.

The minimum bar for a code committer would then be:
- (almost) unaltered acceptance of their PRs (of course, some PRs are
intentionally made for discussions and those would even be a plus!)
- following mxnets community guidelines, rules and styles
- giving useful reviews (in order to see how they would be as reviewers if
they were a committer)
The would be weighted differently on a case by case base, but this could be
a starting point to describe what we are looking for.

>From committer to PMC on the other hand, the difference is quite small.
Something I personally would be looking for are three things:
- judgement
- community engagement
- Apache way
While a committer might be chosen due to their contributions, they wouldn't
be evaluated that strictly for the above points. A PMC member is a
representative of the project who steers the long term development of it.
Thus, they should be active on our channels like dev@, make good reviews on
GitHub (if applicable), express good judgement and reasoning during votes
and generally show that they are generally helpful to the project on a
non-code level.

These are just some thoughts of mine to help start of this discussions. It
would be good to hear what other people are looking for while evaluating
candidates and if there's anything they would like to highlight.

==

Comments from Carin:

I think it is a good idea. Here is a bit of reasoning behind my thoughts.

*Pros of separating Committer and PMC *
 - It would allow us to bring on more committers than the previous criteria
which would help in giving people the tools they need to be productive.
 - The increased productivity should allow PRs to be reviewed and merged
more quickly.
 - Provide a more welcoming experience for people posting new PRs to have
them processed faster.
 - Also provide an additional layer of membership (PMC) after a committer
to help motivate involvement.

*Cons of separating*
 - There is a possibility of having someone as a committer that isn't as
closely aligned to the standards and quality suffers.
*Possible Mitigation*
- We do have a robust CI that should ensure that basic functionality
doesn't break.
- Do additional communication when a new committer is announced what
the expectation of the standards of committership is.
- Two votes now need to happen for a person since there are two levels.
   *Possible Mitigation*
- If we are convinced the person would be a good PMC member as well, we
could vote them as both at the same time.

I think it would be a good change to try and see how it works out over a
period of a few months. The nice thing is that if we feel like it isn't
working well, we can always change the process.

==


Best,
Haibin


Re: MXNet Podling Report - October

2018-10-09 Thread Haibin Lin
Hi Sebastian, Markus and Bob,

Do you have time to review MXNet's podling report for October? Is there any
question or concern? Thank you.

Best,
Haibin

On Fri, Oct 5, 2018 at 5:49 AM Michael Wall  wrote:

> Looks good to me. Not sure that Justin is subscribed to the dev so
> including him explicitly here.  Thanks Haibin.
>
> Mike
>
> On Thu, Oct 4, 2018 at 8:12 PM Haibin Lin 
> wrote:
>
> > Hi Justin and Michael,
> >
> > I updated the report with the links to the tutorial summaries:
> > June -
> >
> >
> https://lists.apache.org/thread.html/52f88e9dc7a6a2a1dfa5ad41c469fe2cdd1209a0be2eb345bc2f9a96@%3Cuser.mxnet.apache.org%3E
> > July -
> >
> >
> https://lists.apache.org/thread.html/dea9184350f2fe87ce450722ead28072f763196045f39859190f83f8@%3Cuser.mxnet.apache.org%3E
> > August - https://discuss.mxnet.io/t/apache-mxnet-digest-august-2018/1863
> >
> > Justin, the length of the permanent link is longer than 76 characters.
> > Would this be an issue?
> >
> > Best,
> > Haibin
> >
> >
> >
> > On Thu, Oct 4, 2018 at 1:16 PM Haibin Lin 
> > wrote:
> >
> > > Hi Justin,
> > >
> > > Thanks for the notice. I've reformatted the MXNet section to have at
> most
> > > 76 characters per line. Sorry about the last minute update.
> > >
> > > Best,
> > > Haibin
> > >
> > > On Thu, Oct 4, 2018 at 12:59 PM Justin Mclean 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> I noticed you have edited the report after the due date and have
> broken
> > >> the formatting a little after I formatted it. Each line must have a
> > maximum
> > >> of 76 characters, would you mind fixing your section of the report?
> > >>
> > >> Thanks,
> > >> Justin
> > >>
> > >
> >
>


Re: MXNet Podling Report - October

2018-10-04 Thread Haibin Lin
Hi Justin and Michael,

I updated the report with the links to the tutorial summaries:
June -
https://lists.apache.org/thread.html/52f88e9dc7a6a2a1dfa5ad41c469fe2cdd1209a0be2eb345bc2f9a96@%3Cuser.mxnet.apache.org%3E
July -
https://lists.apache.org/thread.html/dea9184350f2fe87ce450722ead28072f763196045f39859190f83f8@%3Cuser.mxnet.apache.org%3E
August - https://discuss.mxnet.io/t/apache-mxnet-digest-august-2018/1863

Justin, the length of the permanent link is longer than 76 characters.
Would this be an issue?

Best,
Haibin



On Thu, Oct 4, 2018 at 1:16 PM Haibin Lin  wrote:

> Hi Justin,
>
> Thanks for the notice. I've reformatted the MXNet section to have at most
> 76 characters per line. Sorry about the last minute update.
>
> Best,
> Haibin
>
> On Thu, Oct 4, 2018 at 12:59 PM Justin Mclean  wrote:
>
>> Hi,
>>
>> I noticed you have edited the report after the due date and have broken
>> the formatting a little after I formatted it. Each line must have a maximum
>> of 76 characters, would you mind fixing your section of the report?
>>
>> Thanks,
>> Justin
>>
>


Re: MXNet Podling Report - October

2018-10-04 Thread Haibin Lin
Hi Justin,

Thanks for the notice. I've reformatted the MXNet section to have at most
76 characters per line. Sorry about the last minute update.

Best,
Haibin

On Thu, Oct 4, 2018 at 12:59 PM Justin Mclean  wrote:

> Hi,
>
> I noticed you have edited the report after the due date and have broken
> the formatting a little after I formatted it. Each line must have a maximum
> of 76 characters, would you mind fixing your section of the report?
>
> Thanks,
> Justin
>


Re: MXNet Podling Report - October

2018-10-04 Thread Haibin Lin
Hi Michael,

Thanks for reviewing the report! I've so far updated the report based on
question 1,2, 4 and 5.

1. None.
2. Links:

https://medium.com/apache-mxnet

https://twitter.com/ApacheMXNet

https://www.youtube.com/apachemxnet

https://www.slideshare.net/apachemxnet

https://www.reddit.com/r/mxnet/
3. I'll send out the link to the collection of tutorials shortly.
4. The numbers have been captured from
https://github.com/apache/incubator-mxnet/pulse/monthly at the end of each
months.
Today, covering 9/4-10/4 the summary shows:
Excluding merges, 48 authors have pushed 126commits to master and 134
commits to all branches. On master, 429 files have changed and there have
been 13,458 additions and 5,975 deletions.
Unfortunately, Github doesn’t provide an API to capture information for
specific time frames. If there is a preferred or standard way in Apache to
capture the changes, we would like to learn more about it..
As in previous reports we only captured changes from the incubator-mxnet
repo. incubator-mxnet-site is generated through CI from the incubator-mxnet
repo and contains static content for http://mxnet.incubator.apache.org/

5 - I added the statement from the top of the report: MXNext had several
new mentors added to help with diversity among it's mentors.

Best,
Haibin

On Tue, Oct 2, 2018 at 6:44 PM Michael Wall  wrote:

> Hi Haibin,
>
> A couple of things I thought of when reviewing the report
> 1 - No answer for the question "Any issues that the Incubator PMC (IPMC) or
> ASF Board wish/need to be aware of?"
> 2 - What are the links to the medium blog, the youtube channel and the
> twitter account?
> 3 - Where did the 62 tutorials come from?  Mostly my own interest to see
> them.
> 4 - How were the github stats calculated?  For Sep 2018 issues created, the
> report read 87 but I come up with 112 (the sum of open and closed  from
>
> https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+created%3A2018-09-01..2018-09-30+
> ).
> For Sep 2018 issues closed the report read 124 where I get 110 (
>
> https://github.com/apache/incubator-mxnet/issues?utf8=%E2%9C%93=is%3Aissue+closed%3A2018-09-01..2018-09-30+
> ).
> But that is only for the incubator-mxnet repo, it doesn't include
> incubator-mxnet-site or incubator-mxnet-test.  The links could be included
> in the report as well which might help to generate the numbers next time.
> 5 - You mention new mentors were added, but not why.  If the board asks, it
> will be a follow up task.
>
> Mike
>
> On Mon, Oct 1, 2018 at 8:33 PM Haibin Lin 
> wrote:
>
> > Hi MXNet community,
> >
> > The podling report for MXNet is due on October 3rd. The report covers
> > MXNet's progress on community development and project development (the
> > previous one can be found here <
> https://wiki.apache.org/incubator/July2018
> > >).
> > You can search "MXNet" at https://wiki.apache.org/incubator/October2018
> > for
> > MXNet's draft report for October. Please help review and contribute to
> the
> > report before it's due.
> >
> > If you have any suggestions on improving the report, please let me know
> and
> > I'm happy to update the report based on the feedback. Thanks!
> >
> > Best regards,
> > Haibin
> >
>


MXNet Podling Report - October

2018-10-01 Thread Haibin Lin
Hi MXNet community,

The podling report for MXNet is due on October 3rd. The report covers
MXNet's progress on community development and project development (the
previous one can be found here ).
You can search "MXNet" at https://wiki.apache.org/incubator/October2018 for
MXNet's draft report for October. Please help review and contribute to the
report before it's due.

If you have any suggestions on improving the report, please let me know and
I'm happy to update the report based on the feedback. Thanks!

Best regards,
Haibin


Re: [Discuss] Next MXNet release

2018-10-01 Thread Haibin Lin
I found 2 bugs related to gluon Trainer with distributed KVStore. Basically
if someone uses Gluon for distributed training with a learning rate
schedule (e.g. train ResNet50 for image classification), it won't work.

https://github.com/apache/incubator-mxnet/issues/12713

I have the fix for the first bug locally, but I don't have the fix for the
second one.

Best,
Haibin

On Mon, Oct 1, 2018 at 10:14 AM Afrooze, Sina  wrote:

> This post suggests there is a regression from 1.1.0 to 1.2.1 related to
> MKLDNN integration:
> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>
> The error is related to MKLDNN layout not being converted back to MXNet
> layout in some operator: " !IsMKLDNNData() We can’t generate TBlob for
> MKLDNN data. Please use Reorder2Default() to generate a new NDArray first"
>
> Sina
>
>
>
>
> On 9/30/18, 6:55 PM, "Steffen Rochel"  wrote:
>
> Thanks Patrick.
> Updated roadmap and next release content.
>
> Patrick - suggest to send a reminder to review the design doc and
> collect
> feedback.
> Are there still known issues or gaps before we declare MKL-DNN
> integration
> as GA?
>
> Regards,
> Steffen
>
> On Sat, Sep 29, 2018 at 1:31 AM Zhao, Patric 
> wrote:
>
> > Thanks, Steffen.
> >
> > Regarding the next release note, two items from our side:
> >
> > 1. (-remove) MKL-DNN integration is done. I think we can remove this
> item.
> > 2. (+add) MKL-DNN based graph optimization and quantization by
> subgraph
> > Design doc:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
> > Lead Contributor: Patric Zhao,
> https://github.com/pengzhao-intel/
> >
> > Regarding the Roadmap
> > (+add) Q1 2019: MKL-DNN RNN API supports
> >
> > BR,
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > Sent: Saturday, September 29, 2018 11:31 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [Discuss] Next MXNet release
> > >
> > > Sorry I meant to say next 'Regarding the *minor* release'.
> > >
> > > On Sat, Sep 29, 2018 at 5:27 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Thanks for transparently setting a rough timeline Steffen.  I
> think
> > > > this will go a long way in helping the community plan their
> work, even
> > > > if the details change somewhat on the road to the release.
> > > >
> > > > Regarding the major release: I would propose we unify TensorRT
> with
> > > > the subgraph operator work.
> > > >
> > > > Regarding the patch release:  There were a few minor stack/buffer
> > > > overflows exposed by ASAN that have been addressed.  It's
> probably a
> > > > good idea to include them in a patch release, as they at best
> result
> > > > in non-deterministic behaviour.
> > > >
> > > > -Kellen
> > > >
> > > >
> > > > On Sat, Sep 29, 2018 at 1:39 AM Steffen Rochel
> > > > 
> > > > wrote:
> > > >
> > > >> I updated
> > > >>
> > > >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> > > >> or+next+MXNet+Release
> > > >> ,
> > > >> removed the completed items from 1.3 release and would like to
> kick
> > > >> off discussion about the next release. Please suggest what you
> would
> > > >> like to see included in the next release together with link to
> design
> > > >> proposal (appropriately for the size and complexity of the
> proposal)
> > > >> or suggest changes.
> > > >> I suggest to target the next release for December 2018 to frame
> the
> > > >> discussion.
> > > >> Lets include review of
> > > >> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Roadmap
> -
> > > >> time to update and discuss changes.
> > > >>
> > > >> From the 1.3 release we had discussion regarding
> > > >> https://github.com/apache/incubator-mxnet/issues/11849 and
> resolution
> > > >> in
> > > >> https://github.com/apache/incubator-mxnet/pull/12412 .
> > > >> Are you aware of critical issues and feedback from user which we
> > > >> should consider for a potential 1.3.1 patch release. Should we
> > > >> include PR 12412 in a potential patch release?
> > > >>
> > > >> Regards,
> > > >> Steffen
> > > >>
> > > >
> >
>
>
>
>


Re: [VOTE] Release MXNet version 1.3.0.RC0

2018-09-06 Thread Haibin Lin
+1 built from source and passes dist_sync_kvstore test on Ubuntu.

Best,
Haibin

On Thu, Sep 6, 2018 at 1:32 PM Indhu  wrote:

> +1
>
> The release candidate looks good. I'm able to build and run basic models.
>
> One the FP16 issue:
>
> Like others have pointed out, releases on expensive in terms of time and
> effort. There needs to be a high and more objective bar on what qualifies
> as a release blocker to make sure we are not setting precedence for a lot
> of release blockers in future.
>
> I think a release blocker is justified only if there is a serious bug
> discovered in one of the features included in the release or if there is a
> regression. Given FP16 supports is not a new feature claimed in this
> release and this is not a regression in this release candidate, I'm
> inclined to release this candidate and include the FP16 fix in a subsequent
> release.
>
> Thanks,
> Indu
>
> On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham 
> wrote:
>
> > 0 (non-binding) If we have a problem that blocks users, and a solution in
> > hand... then we should fix it, but not at the expense of starting the
> > release cycle again just for one fix. Users can cherry pick or build from
> > master if they want the fix right away, right? I'd change my mind to -1
> if
> > this wasn't the case, with good reason, and if the user impact was
> critical
> > to adoption or risks abandonment.
> >
> >
> > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote <
> roshaninagmo...@gmail.com>
> > wrote:
> >
> > > I believe everyone here is working hard to make MXNet a better
> framework
> > > for users. It's completely okay to have different opinions, we can
> decide
> > > together if this issue is a blocker or not after voting time is over.
> > >
> > > As I mentioned before, voting will end at 7 pm today. So there is still
> > > time to test the release. If there are any other issues anyone finds, I
> > > will be happy to start the process again and work on RC1. For now, I
> want
> > > to encourage everyone to utilize this time and vote. :)
> > >
> > > Thanks,
> > > Roshani
> > >
> > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > >1. As a Apache MXNet community member, I raised the concern of
> > broken
> > > >functionality for the user. I explained and provided the data
> points
> > > on
> > > > the
> > > >issue, workaround and why I think it is important. If after all
> > this,
> > > > you
> > > >think my vote is biased on my employer just because a user I
> quoted
> > is
> > > > from
> > > >Amazon, this is more concerning to me on my voting abilities.
> > > >2. My -1 no where undermines the huge amount of effort that goes
> > > behind
> > > >the scene for a release to happen. Great respect and recognition
> for
> > > >everyone involved in all the releases of MXNet in the past and
> > this. I
> > > >voted on my judgement of what may be good for the users of MXNet.
> > > >3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free to
> > decide
> > > >and progress on the release as we already have >3 +1 in this
> thread.
> > > >
> > > >
> > > > Best,
> > > >
> > > > Sandeep
> > > >
> > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier 
> > > > wrote:
> > > >
> > > > > btw, there are no vetoes on package releases:
> > > > >
> > > > > VOTES ON PACKAGE RELEASES
> > > > > 
> > > > >
> > > > > Votes on whether a package is ready to be released use majority
> > > approval
> > > > > 
> > --
> > > > i.e.
> > > > > at least three PMC members must vote affirmatively for release, and
> > > there
> > > > > must be more positive than negative votes.Releases may not be
> vetoed.
> > > > > Generally
> > > > > the community will cancel the release vote if anyone identifies
> > serious
> > > > > problems, but in most cases the ultimate decision, lies with the
> > > > individual
> > > > > serving as release manager. The specifics of the process may vary
> > from
> > > > > project to project, but the 'minimum quorum of three +1 votes' rule
> > is
> > > > > universal.
> > > > >
> > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha 
> wrote:
> > > > >
> > > > > > Thanks for sharing your opinions, Thomas. Your recognition and
> > > respect
> > > > of
> > > > > > people's efforts on preparing the release candidate are certainly
> > > > > > appreciated.
> > > > > >
> > > > > > Now that the vote is set to fail thanks to the veto, there will
> be
> > > > plenty
> > > > > > of opportunities to include those bug fixes, including the one
> Zhi
> > > > > > mentioned [1], which was already merged in the master and yet
> chose
> > > not
> > > > > to
> > > > > > block this release with [2]. I will be happy to work with Roshani
> > to
> > > > > > prepare another release candidate once ready.
> > > > > >
> > > > > > -sz
> > > > > >
> > > > > > [1]
> 

Re: Consolidating developer guide in one place (cwiki preferred)

2018-08-15 Thread Haibin Lin
+1

On Wed, Aug 15, 2018 at 1:10 PM, Aaron Markham 
wrote:

> Hi Lin, I agree with this organization. If you feel like somethings should
> be transitioned from the website to the wiki, I can help with that, but for
> the moment I've been suggesting that new developer-focused content be
> placed on the wiki.
>
> On Tue, Aug 14, 2018 at 10:40 AM, Lin Yuan  wrote:
>
> > Dear MXNet community,
> >
> > As a developer, I noticed we have some developer guide scattered in
> > different websites (mxnet.io, cwiki):
> >
> > E.g.
> >
> > How to Create New Operators (Layers): [
> > https://mxnet.incubator.apache.org/faq/new_op.html]
> > A Guide to Implementing Sparse Operators in MXNet Backend [
> > https://cwiki.apache.org/confluence/display/MXNET/A+
> > Guide+to+Implementing+Sparse+Operators+in+MXNet+Backend
> > ]
> >
> > When searching developer guide by keyword, only one of them can be
> returned
> > on either site.
> >
> > It will be more convenient for developers if all the developer guide
> > resides on cwiki and all user guide (non-developer) on the mxnet.io
> > website. We can add a link on mxnet.io to refer all developers to cwiki
> > for
> > guidance.
> >
> > Any comment is appreciated.
> >
> > Best Regards,
> >
> > Lin
> >
>


Re: Duplication of Operators for sampling from random distributions

2018-07-24 Thread Haibin Lin
Hi Anirudh,

Thanks for asking this on dev@. I looked at the doc for sample_uniform and
random_uniform, and found that the API is different. For sample_uniform,
the type of arguments `low` and `high` is NDArray, while that of
random_uniform's is float. I don't think they're going to be deprecated.

The recommended API to generate a random number is via the ndarray.random.*
or symbol.random.*, which accept both float and NDArray, and under the hood
invoke either sample_xxx or random_xxx correspondingly.

Best,
Haibin

On Mon, Jul 23, 2018 at 1:42 PM, Anirudh Acharya 
wrote:

> Hi All,
>
> I had earlier filed an issue with functionality-duplication/code-refactor
> here - https://github.com/apache/incubator-mxnet/issues/11811
>
> As per the suggestion in the github issue I would like to bring it to the
> attention of the wider community -
>
> The operators defined in sample_op.cc and multisample_op.cc are seemingly
> performing the same tasks. Both these files define the following operators
> respectively
>
> sample_op.cc
> ---
> random_uniform
> random_normal
> random_gamma
> random_exponential
> random_poisson
> random_negative_binomial
> random_generalized_negative_binomial
>
> multisample_op.cc
> --
> sample_uniform
> sample_normal
> sample_gamma
> sample_exponential
> sample_poisson
> sample_negative_binomial
> sample_generalized_negative_binomial
>
> The only difference that I can glean from the documentation is that
> operators in multisample_op.ccperforms concurrent sampling from multiple
> distributions, but the behavior of the operators is not different.
>
> Is sample_op.cc being retained for legacy reasons or backward
> compatibility? Can it be deprecated or EOLed? Correct me if I am wrong
> here.
>
>
> Thanks
>
> Anirudh
>


Re: Should MXNet 1.3 contain a buggy version of nn.Embedding backward by default?

2018-07-24 Thread Haibin Lin
Hi Hao,

Did you look at the AddTakeGrad for sparse gradient
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/indexing_op.cu#L77
 ? If I'm not mistaken, Leonard doesn't see nan values generated by the
sparse gradient kernel. The sparse kernel shares similar parallelization
strategy with the dense AddTakeGradLargeBatch and can be easily adapted to
replace the dense kernel by removing the "lookup_table" argument of the
kernel.

Best,
Haibin


On Mon, Jul 23, 2018 at 11:45 PM, Hao Jin  wrote:

> Hi all,
> Some preliminary benchmark results have been shared on the related PR, and
> what we've found is that based on the sample benchmark with an input on
> which the LargeBatch version is supposed to have a better performance,
> there was no significant increase in performance compared with either the
> new general backward kernel or the AddTakeGrad function, and the LargeBatch
> version is deemed buggy based on Leo's reproduction example given in the
> original issue. I would propose that we delete the LargeBatch version and
> use the AddTakeGrad version by default. If there's no obvious objection
> then we'll go ahead in that direction.
> Hao
>
> On Mon, Jul 23, 2018 at 9:12 PM, Naveen Swamy  wrote:
>
> > If it is buggy, how does it matter if it is performant or not? I am not
> > seeing the rationale to make the correct version only opt-in.
> >
> >
> > On Mon, Jul 23, 2018 at 6:47 PM, Leonard Lausen <
> > leonard-softw...@lausen.nl>
> > wrote:
> >
> > > Currently the default kernel of nn.Embedding backward is known to be
> > > buggy on P3 instances or using Cuda 9.2 (though the issue also occurs
> on
> > > other instances with earlier version of Cuda, but less often).
> > >
> > > https://github.com/apache/incubator-mxnet/issues/11314
> > >
> > > There is currently an opt-in for using a bug-free kernel, but it is not
> > > the default. However, the bug-free kernel is used by default for shape
> > > smaller 16384.
> > >
> > > Should MXNet ship a more efficient but buggy kernel in v1.3 or use a
> > > correct but less efficient kernel by default? As MXNet v1.3 is likely
> to
> > > be used a lot with Cuda 9.2 I believe the default behavior should be
> > > changed to use the bug-free but less efficient Kernel. Correctness and
> > > providing a good user experience should be No. 1 here (?). Then users
> > > that want a faster but buggy backward kernel can still select to do so.
> > > Note this only affects the backward pass.
> > >
> > > Hao did related work on improving the take operator
> > > https://github.com/apache/incubator-mxnet/pull/11326
> > > https://github.com/apache/incubator-mxnet/pull/11795 which also fixes
> > > the issue, but he found it to be only "slightly faster" compared to the
> > > bug-free kernel that is currently under opt-in while leading to CI
> > > failures on Windows.
> > >
> > > In my experience, there is no speed difference between the current
> buggy
> > > and
> > > opt-in bug-free kernel, but the GPU utilization of the latter is 100%
> > > compared
> > > to 60% of the former (benchmark script:
> > > https://github.com/apache/incubator-mxnet/pull/11795#
> > > issuecomment-405808567 )
> > >
> >
>


Re: [DISCUSS] Subscribe dev@ to Github Activities?

2018-07-12 Thread Haibin Lin
Agree. +1 for more transparency

On Thu, Jul 12, 2018 at 3:27 PM, Zha, Sheng 
wrote:

> My intention is really just to bridge the gap between so much happening on
> github v.s. "whatever didn't happen on dev list didn't happen".
>
> Also, since dev@ is intended to be an asynchronous way for community to
> follow technical conversations, there wasn't really a requirement for
> anyone to read all of them in the first place.
>
> Best regards,
> -sz
>
> On 7/12/18, 3:20 PM, "Timur Shenkao"  wrote:
>
> Flink - yes
> Spark - it was previously but not now
>
> Yeah, amount of messages would be tripled at least: Jira + Github
> issue + PR
>
> On Thu, Jul 12, 2018 at 11:13 PM, Haibin Lin  >
> wrote:
>
> > I'm a bit concerned with the amount of emails flooding in. In the
> past week
> > there're 32 new issues and 35 new pull requests. This means on avg
> 10 email
> > per day and I doubt I'll read all of them.. Does the Spark community
> > subscribe dev@ to github?
> >
> > Best,
> > Haibin
> >
> > On Thu, Jul 12, 2018 at 3:08 PM, Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > -1   It's a lot of traffic, whomever wants to subscribe can do it
> in
> > > github. I'm afraid it will decrease signal to noise ratio in the
> list.
> > >
> > > On Thu, Jul 12, 2018 at 11:32 PM Lin Yuan 
> wrote:
> > >
> > > > +1
> > > >
> > > > On Thu, Jul 12, 2018 at 12:26 PM Anirudh Acharya <
> > anirudhk...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Thu, Jul 12, 2018 at 11:51 AM Piyush Ghai <
> ghai.piy...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > > > On Jul 12, 2018, at 11:50 AM, Tianqi Chen <
> > > tqc...@cs.washington.edu>
> > > > > > wrote:
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Thu, Jul 12, 2018 at 11:10 AM, Sheng Zha <
> szha@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> Should we subscribe dev list to github updates on mxnet
> repo?
> > Both
> > > > > > github
> > > > > > >> issues/PRs and the dev list are intended for technical
> > discussions
> > > > and
> > > > > > in
> > > > > > >> that aspect largely share the same goal. Since MXNet has
> most
> > > > activity
> > > > > > >> github, this could help dev@ to become more active. Some
> pros
> > and
> > > > > cons:
> > > > > > >>
> > > > > > >> Pros:
> > > > > > >> - There have been many high quality discussions that
> happen on
> > > > github
> > > > > to
> > > > > > >> which the dev list can benefit.
> > > > > > >> - Replies on update emails are reflected on the specific
> > issue/PR.
> > > > > > >> - Users can also choose to click on the link and go to
> github to
> > > > > > >> participate in discussion.
> > > > > > >> - We still have the ability to carry out dev@ only
> > conversation.
> > > > > > >>
> > > > > > >> Cons:
> > > > > > >> - Higher volume on dev list.
> > > > > > >> - Some discussions might not be suitable for dev@.
> (though I
> > > can't
> > > > > > think
> > > > > > >> of
> > > > > > >> why such conversation should happen on github either)
> > > > > > >>
> > > > > > >> -sz
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>


Re: [DISCUSS] Subscribe dev@ to Github Activities?

2018-07-12 Thread Haibin Lin
I'm a bit concerned with the amount of emails flooding in. In the past week
there're 32 new issues and 35 new pull requests. This means on avg 10 email
per day and I doubt I'll read all of them.. Does the Spark community
subscribe dev@ to github?

Best,
Haibin

On Thu, Jul 12, 2018 at 3:08 PM, Pedro Larroy 
wrote:

> -1   It's a lot of traffic, whomever wants to subscribe can do it in
> github. I'm afraid it will decrease signal to noise ratio in the list.
>
> On Thu, Jul 12, 2018 at 11:32 PM Lin Yuan  wrote:
>
> > +1
> >
> > On Thu, Jul 12, 2018 at 12:26 PM Anirudh Acharya 
> > wrote:
> >
> > > +1
> > >
> > > On Thu, Jul 12, 2018 at 11:51 AM Piyush Ghai 
> > > wrote:
> > >
> > > > +1
> > > > > On Jul 12, 2018, at 11:50 AM, Tianqi Chen <
> tqc...@cs.washington.edu>
> > > > wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > On Thu, Jul 12, 2018 at 11:10 AM, Sheng Zha 
> > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> Should we subscribe dev list to github updates on mxnet repo? Both
> > > > github
> > > > >> issues/PRs and the dev list are intended for technical discussions
> > and
> > > > in
> > > > >> that aspect largely share the same goal. Since MXNet has most
> > activity
> > > > >> github, this could help dev@ to become more active. Some pros and
> > > cons:
> > > > >>
> > > > >> Pros:
> > > > >> - There have been many high quality discussions that happen on
> > github
> > > to
> > > > >> which the dev list can benefit.
> > > > >> - Replies on update emails are reflected on the specific issue/PR.
> > > > >> - Users can also choose to click on the link and go to github to
> > > > >> participate in discussion.
> > > > >> - We still have the ability to carry out dev@ only conversation.
> > > > >>
> > > > >> Cons:
> > > > >> - Higher volume on dev list.
> > > > >> - Some discussions might not be suitable for dev@. (though I
> can't
> > > > think
> > > > >> of
> > > > >> why such conversation should happen on github either)
> > > > >>
> > > > >> -sz
> > > > >>
> > > >
> > > >
> > >
> >
>


Re: C++ api issue labeling

2018-07-12 Thread Haibin Lin
+1 merging "feature" with "feature request"

On Tue, Jul 10, 2018 at 12:59 PM, Anirudh Acharya 
wrote:

> There is another instance of label duplication - We have labels "Feature" (
> https://github.com/apache/incubator-mxnet/labels/Feature ) and "Feature
> Request" (
> https://github.com/apache/incubator-mxnet/labels/Feature%20request ). I
> don't think there is much difference between these two labels.
>
> It would make sense to merge the "Feature" label into "Feature Request".
>
>
> Thanks
> Anirudh
>
>
> On Wed, Jun 27, 2018 at 3:50 PM Hagay Lupesko  wrote:
>
> > Thank you everyone for your suggestions.
> > I will work with a committer to get this updated ASAP.
> >
> > On Mon, Jun 25, 2018 at 8:55 AM Marco de Abreu
> >  wrote:
> >
> > > +1 to renaming to Backend
> > >
> > > On Mon, Jun 25, 2018 at 10:13 AM Hagay Lupesko 
> > wrote:
> > >
> > > > Thanks Lin for your feedback.
> > > > Bumping again to get more feedback before concluding.
> > > >
> > > > On Fri, Jun 22, 2018 at 8:53 AM Lin Yuan 
> wrote:
> > > >
> > > > > I agree with Hagay. Using "Backend" as label makes it much easier
> to
> > > > track.
> > > > >  "C++" label only describes the language used in implementation,
> > > > "Backend"
> > > > > better describes the nature of the work (let's assume we change the
> > > > backend
> > > > > implementation from C++ to other languages in the future).
> > > > >
> > > > > Lin
> > > > >
> > > > > On Fri, Jun 22, 2018 at 1:09 AM Hagay Lupesko 
> > > wrote:
> > > > >
> > > > > > Thanks everyone for chiming in and clarifying.
> > > > > > It seems that the "C++" label name is confusing for our community
> > > since
> > > > > it
> > > > > > can be interpreted as both the CPP API and the backend...
> > > > > > As an anecdote, this issue [1
> > > > > > ] is
> > labeled
> > > > as
> > > > > > "C++" but is about the CPP API, not the backend.
> > > > > >
> > > > > > Should we just rename "C++" to "Backend" to avoid confusion?
> > > > > >
> > > > > > [1] https://github.com/apache/incubator-mxnet/issues/10937
> > > > > >
> > > > > > On Thu, Jun 21, 2018 at 12:39 PM Pedro Larroy <
> > > > > > pedro.larroy.li...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Agree with Anirudh, they are different things. Maybe change the
> > > "C++"
> > > > > > label
> > > > > > > to "backend" would be more informative?
> > > > > > >
> > > > > > > On Thu, Jun 21, 2018 at 12:11 PM Anirudh <
> anirudh2...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Hagay,
> > > > > > > >
> > > > > > > > I think we should keep these two labels seperate since they
> > mean
> > > > > > > different
> > > > > > > > things.
> > > > > > > > The C++ label refers to the issue for MXNet backend and the
> CPP
> > > > > package
> > > > > > > > refers to the CPP language binding for mxnet.
> > > > > > > > We can still make C++ API great again irrespective by
> filtering
> > > out
> > > > > CPP
> > > > > > > > package issues :).
> > > > > > > >
> > > > > > > > Anirudh
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jun 21, 2018 at 11:56 AM, Hagay Lupesko <
> > > lupe...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hey community,
> > > > > > > > >
> > > > > > > > > I was going over the open GitHub issues for MXNet, and
> > noticed
> > > > that
> > > > > > we
> > > > > > > > have
> > > > > > > > > two labels for the CPP API: "CPP package", "C++"
> > > > > > > > >
> > > > > > > > > Wanted to suggest we remove "CPP package" and just stick to
> > > "C++"
> > > > > > > > > This will make it easier for the community to classify
> issues
> > > and
> > > > > > focus
> > > > > > > > on
> > > > > > > > > making the C++ API great again ;)
> > > > > > > > >
> > > > > > > > > Let me know if someone has any concerns, otherwise I will
> > find
> > > a
> > > > > > > > committer
> > > > > > > > > that I can work with to make this change.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Hagay
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release MXNet version 1.2.1.RC1

2018-07-12 Thread Haibin Lin
+1
Built from source with cuda and dist kvstore. Ran dist_sync_kvstore.py
nightly test and it passed.

Best,
Haibin

On Wed, Jul 11, 2018 at 6:13 PM, Roshani Nagmote 
wrote:

> Hi All,
>
> Could you please test and vote for this release? Voting will end tomorrow
> by 5:50 pm PDT.
>
> Thanks,
> Roshani
>
> On Mon, Jul 9, 2018 at 4:53 PM Roshani Nagmote 
> wrote:
>
> > Hi all,
> >
> > I would like to propose a vote to release Apache MXNet (incubating)
> > version
> > 1.2.1.RC1. Voting will start now (Monday, Jul 9th) and end at 5:50 PM
> > PDT, Thursday, July 12th.
> >
> > Link to release candidate 1.2.1.rc1:
> > *https://github.com/apache/incubator-mxnet/releases/tag/1.2.1.rc1
> > *
> >
> > View this page, click on "Build from Source", and use the source code
> > obtained from 1.2.1.rc1 tag:
> > https://mxnet.incubator.apache.org/install/index.html
> >
> > (Note: The README.md points to the 1.2.1 tag and does not work at the
> > moment.)
> >
> > Please remember to test first before voting accordingly:
> >
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > Thanks,
> > Roshani
> >
>


Re: [VOTE] Release MXNet version 1.2.1.RC0 (Patch Release)

2018-06-21 Thread Haibin Lin
+1

Built from source with CUDA on Ubuntu.

Ran example/gluon/word_language_model/train.py

Best,
Haibin


On Thu, Jun 21, 2018 at 11:08 AM, Anirudh  wrote:

> Hi Pedro,
>
> I think you raised this issue in 1.2.0 release here:
> https://lists.apache.org/thread.html/ddc088a21aac179144350ea97353a7
> ea885b2765ccb98db08a03ba2d@%3Cdev.mxnet.apache.org%3E
> .
> I actually forgot about this issue during this release. Having said that, I
> think since this works with make and the customers using cmake with
> USE_OPENMP=OFF should be considerably small we should not block the release
> for this.
> The main reason we are doing this release is for this issue
>  . Now pulling
> this
> change for the cmake fix would be also mean we need to pull 8 more commits
> from dmlc-core and its considerable risk to introduce for the patch
> release.
> This would also mean cutting another rc. I think in the interest of our
> customers who are eagerly waiting for the patch release to fix the main
> issue, we should move ahead here.
> I missed reviewing all the known issue of 1.2.0 and add it to 1.2.1 release
> notes. I will do that now.
>
> Anirudh
>
>
>
> On Thu, Jun 21, 2018 at 10:42 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > wrote:
>
> > I think I have fixed this before, I will check if the patch didn't make
> it
> > to the branch.
> >
> > On Thu, Jun 21, 2018 at 10:24 AM Pedro Larroy <
> > pedro.larroy.li...@gmail.com>
> > wrote:
> >
> > > -1   I can't compile:
> > >
> > > 3rdparty/dmlc-core/libdmlc.a(io.cc.o): In function
> > > `std::thread::thread > InputSplitBase::Chunk>::Init(std::function > > (dmlc::io::InputSplitBase::Chunk**)>, std::function > > ()>)::{lambda()#1}&>(dmlc::ThreadedIter > InputSplitBase::Chunk>::Init(std::function > > (dmlc::io::InputSplitBase::Chunk**)>, std::function > > ()>)::{lambda()#1}&)':
> > > /usr/include/c++/5/thread:137: undefined reference to `pthread_create'
> > > collect2: error: ld returned 1 exit status
> > > ninja: build stopped: subcommand failed.
> > >
> > >
> > > No LSB modules are available.
> > > Distributor ID: Ubuntu
> > > Description:Ubuntu 16.04.4 LTS
> > > Release:16.04
> > > Codename:   xenial
> > >
> > >
> > > My build script:
> > >
> > >
> > > #!/bin/bash
> > > set -e
> > > set -x
> > >
> > > renice -n 19 -p $$
> > >
> > > mkdir -p build && cd build
> > > cmake -DUSE_CPP_PACKAGE=ON -DUSE_CUDA=OFF -DUSE_OPENMP=OFF
> > -DUSE_OPENCV=ON
> > > -DCMAKE_BUILD_TYPE=Debug -GNinja ..
> > > ninja -v
> > >
> > > cd ..
> > > if [ ! -d mxnet_py3 ]; then
> > > virtualenv -p `which python3` mxnet_py3
> > > fi
> > > source mxnet_py3/bin/activate
> > > cd python
> > > pip install -e .
> > > cd ..
> > > pip install opencv-python
> > > pip install ipython
> > > pip install matplotlib
> > >
> > > On Wed, Jun 20, 2018 at 6:33 PM Indhu  wrote:
> > >
> > >> +1
> > >>
> > >> On Mon, Jun 18, 2018, 6:52 PM Anirudh  wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > This is the vote to release Apache MXNet (incubating) version 1.2.1.
> > >> Voting
> > >> > will start now and close Thursday June 21st 7:00 PM PDT.
> > >> >
> > >> > Link to release candidate 1.2.1.rc0:
> > >> >
> > >> > https://github.com/apache/incubator-mxnet/releases/tag/1.2.1.rc0
> > >> >
> > >> > View this page for installation instructions:
> > >> >
> > >> > https://mxnet.incubator.apache.org/install/index.html
> > >> >
> > >> > (Note: The README.md points to the 1.2.1 tag and does not work at
> the
> > >> > moment).
> > >> >
> > >> > Please remember to test first before voting accordingly.
> > >> >
> > >> > +1 = approve
> > >> > +0 = no opinion
> > >> > -1 = disapprove (provide reason)
> > >> >
> > >> > Anirudh
> > >> >
> > >>
> > >
> >
>


Re: The operator check for Scala Package

2018-06-20 Thread Haibin Lin
I appreciate the effort and understand the motivation. However, I'm
concerned that it basically means merging operator PRs becomes sequential.
Developers who work on operators have to update their PR every time a new
operator is merged to master, the burden becomes significant if there're 20
ONNX/sparse operators to add and many PRs are submitted/reviewed in
parallel.

On Wed, Jun 20, 2018 at 10:13 AM, Qing Lan  wrote:

> Hi Haibin,
>
> The operator change is any changes on the operator on C++ side. Trigger
> the check failed?
>- change the documentation of operator in C  Yes
>- change the documentation such as README.md No
>- add/remove/modify operator Yes
>- add/remove/modify operator parameter   Yes
>
> Thanks,
> Qing
>
> On 6/20/18, 10:01 AM, "Haibin Lin"  wrote:
>
> Could you elaborate what you mean by operator change? Does it check the
> operator interface? Would updated operator documentation fail the
> check?
> Would adding a new operator fail this check?
>
>
>
> On Wed, Jun 20, 2018 at 9:48 AM, Qing Lan  wrote:
>
> > Hi Macro,
> >
> > Thanks for your feedback! I believe this should not be a block for
> > contributor in most of the cases.
> > Firstly, this would only be triggered if there is an operator changes
> > (Only general operators).
> > Secondly, it is simple to go through. Just need to change 1 line of
> code
> > would make the PR passed. However it do requires contributor to do
> this or
> > the Scalatest will fail. I have made the error message instructive
> that
> > would help the contributor to dive in and make the changes.
> >
> > I also updated the design document to explain in detail.
> >
> > Thanks,
> > Qing
> >
> >
> > On 6/19/18, 12:09 PM, "Marco de Abreu"  .INVALID>
> > wrote:
> >
> > Okay, thanks for elaborating. I definitely see your point there
> and we
> > definitely don't want these changes to pile up.
> >
> > I don't feel strongly about this and won't stand in the way, I
> just
> > want to
> > express my concern that this could lead to people having to
> touch all
> > language interfaces although they might not familiar with them
> at all.
> > On
> > the other hand we got enough contributors who could help them
> then
> > before
> > the PR can get merged. So either way works, but I just wanted to
> > highlight
> > that this could make it harder to make changes in the backend for
> > people
> > who are not familiar with our frontend API languages. If we got
> enough
> > people who could actively support our contributors in such a
> case, we
> > should be totally fine with blocking a PR until the APIs have
> been
> > adapted.
> >
> > -Marco
> >
> > On Tue, Jun 19, 2018 at 11:58 AM Naveen Swamy <
> mnnav...@gmail.com>
> > wrote:
> >
> > > Marco,
> > >
> > > Qing and I are working together on this. The idea is that we
> fail
> > the build
> > > if there is a operator change on the backend and have not
> synced to
> > the
> > > Scala API. We want to catch this before breaking the user's
> code
> > which will
> > > be a pretty bad experience.
> > >
> > >
> > >
> > > On Tue, Jun 19, 2018 at 11:54 AM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com.invalid> wrote:
> > >
> > > > Hi Qing,
> > > >
> > > > thank you for working on improving the compatibility of our
> APIs!
> > > >
> > > > Your linked proposal does not describe the mentioned
> FILEHASH.
> > Could you
> > > > elaborate a bit? Would this be a hash of the entire file,
> some hash
> > > created
> > > > based on the signature of the underlying C++ methods or
> maybe a
> > different
> > > > approach?
> > > >
> > > > Also, at which step would developers be notified of the
> change? I'd
> > > propose
> > > > that we make this check a nightly job t

Re: Reverting pull request

2018-06-15 Thread Haibin Lin
Why revert the PR when we know there's a fix?
If we keep going backwards like this, no progress can be made.

On Fri, Jun 15, 2018 at 2:37 PM, Mu Li  wrote:

> Agree that major changes need more extensive reviews. But we cannot ignore
> that both reviews and CI cannot catch all bugs. Reverting each PR after
> finding a bug should be the last ways, before it, we should try to fix it
> first.
>
> As for the breaking change, I see it differently. It breaks a not
> recommended usage of the API from an unmaintained tutorial, I don't think
> adding more reviewers will help it.
>
> Besides, I'm less sure if we can find enough reviewers to provide useful
> feedbacks for major changes.
>
> On Fri, Jun 15, 2018 at 2:21 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com.invalid> wrote:
>
> > We revert a PR because it should not have been merged in the first place.
> > So far, I have been ignoring the fact that our committers are constantly
> > breaking our own rules (which we expect contributors to follow). But
> since
> > this caused an impact twice (1.2 breaking change about model
> import/export
> > as well as this regression), I'm now being more strict and enforcing
> them.
> >
> > I could've also made a script that prevents any PR from being
> self-merged,
> > but I thought our committers are responsible enough to follow our own
> rules
> > without systems actually enforcing them. I won't waste my time working on
> > that script, but from now on I will revert every single PR (except
> > emergency cases) that has been self-merged without approval.
> >
> > -Marco
> >
> > On Fri, Jun 15, 2018 at 2:15 PM Mu Li  wrote:
> >
> > > Why reverting instead of fixing the bugs? Static memory aims to reduce
> > > memory allocation, it's a key feature to bridge the perf gap between
> > gluon
> > > and symbol.
> > >
> > > On Fri, Jun 15, 2018 at 2:06 PM, Marco de Abreu <
> > > marco.g.ab...@googlemail.com.invalid> wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm reverting https://github.com/apache/incubator-mxnet/pull/10817
> as
> > of
> > > > https://github.com/apache/incubator-mxnet/pull/11311 due to
> > regressions
> > > > described in https://github.com/apache/incubator-mxnet/issues/11171
> > and
> > > > https://github.com/apache/incubator-mxnet/pull/10817.
> > > >
> > > > The pull request has been self-merged without proper review and
> > > introduced
> > > > regressions. Committers should act as role models in this project and
> > > > adhere to software engineer best practices.
> > > >
> > > > Best regards,
> > > > Marco
> > > >
> > >
> >
>


Re: Make cmake default

2018-06-01 Thread Haibin Lin
+1

Thanks for bringing this up. Maintaining two build systems is a pain.

If we decide to make cmake default, please make sure all installation
documentations are updated correspondingly. They're currently all using
"make" if installed from source.

Best,
Haibin

On Fri, Jun 1, 2018 at 3:06 PM, Anirudh  wrote:

> +1 to using cmake and deprecating Makefile. I was able to find a previous
> discussion on this:
> https://github.com/apache/incubator-mxnet/issues/8702
>
> The concerns raised were
> 1. Building on devices like raspberry pi where cmake is non existent or
> old.
> 2. Adding an additional dependency.
>
> As mentioned in the thread, if we provide good instructions on how to
> install cmake/build cmake from source,
> these concerns will be addressed.
>
> Anirudh
>
> On Fri, Jun 1, 2018 at 2:58 PM, Alex Zai  wrote:
>
> > Just realized that the email lists strips aways all hyperlinks. Attached
> > is a
> > copy of my previous email with links pasted in.
> >
> > What are peoples' thought on requiring cmake when building from source?
> > Currently we have to maintain two independent build files (CMakeLists and
> > Makefile) which makes it more difficult to develop (each are 600+ lines).
> > Also,
> > our current build system (in Makefile) requires that 3rdparty
> dependencies
> > have
> > binaries present (or a Makefile to generate binaries) in the repo, which
> > is not
> > always the case.
> > Generating a makefile with cmake will make our Makefile very simple like
> > PyTorch'sMakefile (20 lines of code -
> > https://github.com/pytorch/pytorch/blob/master/Makefile). Also, not all
> > 3rdparty
> > dependencies have binaries or Makefiles. For 3rdparty/mkldnn we end up
> > calling
> > cmake
> >  (https://github.com/apache/incubator-mxnet/blob/master/
> > prepare_mkldnn.sh#L96)
> > to generate binaries (this does not violate our 'no cmake dependency' as
> > USE_MKLDNN is OFF by default). If we encounter any library in the future
> > that
> > requires us to generate artifacts with cmake, it would be better to make
> > the
> > switch now. Lastly, we already require cmake as a dependency forwindows'
> > developers
> >  (https://www.dropbox.com/s/9sfnderg58z4j1l/Screenshot%
> > 202018-06-01%2013.43.08.png?dl=0)
> > so this would only affect linux / mac developers who do not have cmake
> > already.
> > I currently have a pendingPR
> >  (https://github.com/apache/incubator-mxnet/pull/8/) that depends on
> > this
> > change. The library does not have a Makefile or binaries present. Unlike
> > mkldnn,
> > we would want this library included by default so I cannot generate
> > artifacts
> > with cmake. The alternative would be to strip out only the relevant parts
> > of the
> > code we need from the library. I did this in a previous version of myPR
> >  (https://github.com/apache/incubator-mxnet/compare/
> > dfdfd1ad15de8bb1b899effb0860a4e834093cfc...
> a4267eb80488804a7f74ff01f5627c
> > 47dd46bd78)
> > but it is incredible messy.
> > Please let me know your thoughts.
> > Best,
> > Alex
> >
> >
> >
> >
> >
> > On Fri, Jun 1, 2018 2:51 PM, Alex Zai aza...@gmail.com  wrote:
> > What are peoples' thought on requiring cmake when building from source?
> > Currently we have to maintain two independent build files (CMakeLists and
> > Makefile) which makes it more difficult to develop (each are 600+ lines).
> > Also,
> > our current build system (in Makefile) requires that 3rdparty
> dependencies
> > have
> > binaries present (or a Makefile to generate binaries) in the repo, which
> > is not
> > always the case.
> > Generating a makefile with cmake will make our Makefile very simple like
> > PyTorch's Makefile (20 lines of code). Also, not all 3rdparty
> dependencies
> > have
> > binaries or Makefiles. For 3rdparty/mkldnn we end up calling cmake to
> > generate
> > binaries (this does not violate our 'no cmake dependency' as USE_MKLDNN
> is
> > OFF
> > by default). If we encounter any library in the future that requires us
> to
> > generate artifacts with cmake, it would be better to make the switch now.
> > Lastly, we already require cmake as a dependency for windows'
> > developers so this
> > would only affect linux / mac developers who do not have cmake already.
> > I currently have a pending PR that depends on this change. The library
> > does not
> > have a Makefile or binaries present. Unlike mkldnn, we would want this
> > library
> > included by default so I cannot generate artifacts with cmake. The
> > alternative
> > would be to strip out only the relevant parts of the code we need from
> the
> > library. I did this in a previous version of my PR  but it is incredible
> > messy.
> > Please let me know your thoughts.
> > Best,
> > Alex
> >
>


Re: [LAZY VOTE][RESULT] Upgrade CI to CUDA 9.1 with CuDNN 7.0

2018-05-16 Thread Haibin Lin
Is there a plan for adding those CUDA 8 tests back to CI? What about CUDA 7?

There were a few build problems in the past few weeks due to lack of CI
coverage:
- https://github.com/apache/incubator-mxnet/pull/10710 were found during
1.2 rc voting
- https://github.com/apache/incubator-mxnet/issues/10981 were reported by
an user with CUDA 7

Having these covered in CI will help catch the issues early. I don't recall
if we decided to drop CUDA 7 support for MXNet.

Best,
Haibin

On Wed, Mar 21, 2018 at 6:32 AM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Hello,
>
> the migration has just been completed and we're now running our UNIX based
> slaves on CUDA 9.1 with CuDNN 7. The commit is available at
> https://github.com/apache/incubator-mxnet/commit/
> b0a6760efa141aeca87b03ecf34dae924bd1af46
> .
>
> No jobs have been interrupted by this migration. If you encounter any
> errors, please reach back to me.
>
> Best regards,
> Marco
>
> On Tue, Mar 20, 2018 at 11:20 PM, Marco de Abreu <
> marco.g.ab...@googlemail.com> wrote:
>
> > Hello,
> >
> > the results of this vote are as follows:
> >
> > +1:
> > Jun
> > Anirudh
> > Hao
> > Marco
> >
> > 0:
> > Chris
> >
> > -1:
> > Naveen (veto recalled as of https://lists.apache.org/thread.html/
> > 242db72a0c96349ef6e0ff1d3b1fe0dc7f7a9082532724c3293666c5@%
> > 3Cdev.mxnet.apache.org%3E)
> >
> > Under the constraint that we will use CUDA 8 on Windows and CUDA 9.1 on
> > UNIX slaves and work on integration tests for CUDA 8 in the long term,
> this
> > vote counts as PASSED.
> >
> > The PR for this change is available at https://github.com/apache/
> > incubator-mxnet/pull/10108. I have developed and tested the new slaves in
> > our test environment and everything looks promising so far. The plan is
> as
> > follows:
> >
> >1. Get https://github.com/apache/incubator-mxnet/pull/10108 approved
> >to allow self-merge – CI can’t pass until slaves have been upgraded.
> >2. Replace all existing slaves with new upgraded slaves.
> >3. Retrigger https://github.com/apache/incubator-mxnet/pull/10108 to
> >merge necessary changes into master.
> >
> > IMPORTANT: The migration will happen tomorrow, so please expect some
> delay
> > in job execution - the CI website will be unaffected. Ideally, no jobs
> > should fail - in case they do, please feel free to retrigger them by
> using
> > an empty commit. In case of any errors appearing after the upgrade, don't
> > hesitate to contact me!
> >
> > Best regards,
> > Marco
> >
> >
> > On Tue, Mar 20, 2018 at 1:39 AM, Naveen Swamy 
> wrote:
> >
> >> Yes, for short-term.
> >>
> >> On Monday, March 19, 2018, Chris Olivier 
> wrote:
> >>
> >> > In the short ter, Naveen, are you ok with Linux running CUDA 9 and
> >> Windows
> >> > CUDA 8 in order to get CUDA version coverage?
> >> >
> >> > On 2018/03/16 21:09:09, Marco de Abreu 
> >> > wrote:
> >> > > Thanks for your input. How would you propose to proceed in terms of
> a
> >> > > timeline in case this vote succeedes? I don't really have time to
> work
> >> > on a
> >> > > nightly setup right now. Would anybody in the community be able to
> >> help
> >> > me
> >> > > out here or shall we wait with the migration until a nightly setup
> for
> >> > CUDA
> >> > > 8 is up?
> >> > >
> >> > > -Marco
> >> > >
> >> > > On Fri, Mar 16, 2018 at 9:55 PM, Bhavin Thaker <
> >> bhavintha...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > +1 to the suggestion of testing CUDA8 in few nightly instances and
> >> > using
> >> > > > CUDA9 for most instances in CI.
> >> > > >
> >> > > > Bhavin Thaker.
> >> > > >
> >> > > > On Fri, Mar 16, 2018 at 12:37 PM Naveen Swamy  >
> >> > wrote:
> >> > > >
> >> > > > > I think its best to add support for CUDA 9.0 while retaining
> >> existing
> >> > > > > support for CUDA 8, code might regress when you remove and
> create
> >> > more
> >> > > > work
> >> > > > > to add CUDA 8 support back.
> >> > > > >
> >> > > > > On Fri, Mar 16, 2018 at 9:29 AM, Marco de Abreu <
> >> > > > > marco.g.ab...@googlemail.com> wrote:
> >> > > > >
> >> > > > > > Yeah, sorry Chris, mixed up the names.
> >> > > > > >
> >> > > > > > @Naveen: Would you be fine with doing the switch now and
> adding
> >> > > > > integration
> >> > > > > > tests later or is this a hard constraint for you?
> >> > > > > >
> >> > > > > > On Wed, Mar 14, 2018 at 6:39 PM, Chris Olivier <
> >> > cjolivie...@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Isn't the TItan V the Volta and not the Tesla?
> >> > > > > > >
> >> > > > > > > On Wed, Mar 14, 2018 at 10:36 AM, Naveen Swamy <
> >> > mnnav...@gmail.com>
> >> > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Marco,
> >> > > > > > > > My -1 vote is for dropping support to CUDA 8 and not for
> >> adding
> >> > > > CUDA
> >> > > > > 9.
> >> > > > > > > > CUDA 9.0 support for MXNet was added Oct'30-2017, I think
> >> that
> >> > all
> >> 

Re: Problems with test_sparse_operator.test_sparse_mathematical_core

2018-05-09 Thread Haibin Lin
Hi Marco,

Is auto scaling already enabled on mxnet apache CI, or this is only happens
on your setup? I see the test is using scipy. Do both environments have the
same version of scipy installed?

I recently see lots of test failures on mxnet master. One thing on my wish
list is a database which stores all the occurrences of test failures and
their commit ids, which would be very helpful for initial diagnosing what
code changes potentially introduced bugs. Otherwise clicking all past tests
and reading those logs requires a lot of manual work.

Best,
Haibin

On Wed, May 9, 2018 at 5:32 AM, Marco de Abreu  wrote:

> Hello,
>
> I'm currently working on auto scaling and encountering a consistent test
> failure on CPU. At the moment, I'm not really sure what's causing this,
> considering the setup should be identical.
>
> http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/ci-master/557/pipeline/694
>
> ==
>
> FAIL: test_sparse_operator.test_sparse_mathematical_core
>
> --
>
> Traceback (most recent call last):
>
>   File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in
> runTest
>
> self.test(*self.arg)
>
>   File "/work/mxnet/tests/python/unittest/common.py", line 157, in
> test_new
>
> orig_test(*args, **kwargs)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 1084, in test_sparse_mathematical_core
>
> density=density, ograd_density=ograd_density)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 1056, in check_mathematical_core
>
> density=density, ograd_density=ograd_density)
>
>   File "/work/mxnet/tests/python/unittest/test_sparse_operator.py", line
> 698, in check_sparse_mathematical_core
>
> assert_almost_equal(arr_grad, input_grad, equal_nan=True)
>
>   File "/work/mxnet/python/mxnet/test_utils.py", line 493, in
> assert_almost_equal
>
> raise AssertionError(msg)
>
> AssertionError:
>
> Items are not equal:
>
> Error nan exceeds tolerance rtol=0.10, atol=0.00.  Location of
> maximum error:(0, 0), a=inf, b=-inf
>
>  a: array([[inf],
>
>[inf],
>
>[inf],...
>
>  b: array([[-inf],
>
>[-inf],
>
>[-inf],...
>
>  >> begin captured stdout << -
>
> pass 0
>
> 0.0, 0.0, False
>
> - >> end captured stdout << --
>
>  >> begin captured logging << 
>
> common: INFO: Setting test np/mx/python random seeds, use
> MXNET_TEST_SEED=2103230797 to reproduce.
>
> - >> end captured logging << -
>
>
> Does this ring any bells?
>
> Thanks in advance!
>
> -Marco
>


Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Haibin Lin
I agree with Anirudh that the focus of the discussion should be limited to
the release branch, not the master branch. Anything that breaks on master
but works on release branch should not block the release itself.


Best,

Haibin

On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy 
wrote:

> I see your point.
>
> I checked the failures on the v1.2.0 branch and I don't see segfaults, just
> minor failures due to flaky tests.
>
> I will trigger it repeatedly a few times until Sunday to have a and change
> my vote accordingly.
>
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/v1.2.0/17/pipeline
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> incubator-mxnet/detail/v1.2.0/15/pipeline/
>
>
> Pedro.
>
> On Fri, May 4, 2018 at 7:16 PM, Anirudh  wrote:
>
> > Hi Pedro,
> >
> > Thank you for the suggestions. I will try to reproduce this without fixed
> > seeds and also run it for a longer time duration.
> > Having said that, running unit tests over and over for a couple of days
> > will likely cause
> > problems  because there around 42 open issues for flaky tests:
> > https://github.com/apache/incubator-mxnet/issues?q=is%
> > 3Aopen+is%3Aissue+label%3AFlaky
> > Also, the release branch has diverged from master around 3 weeks back and
> > it doesn't have many of the changes merged to the master.
> > So, my question essentially is, what will be your benchmark to accept the
> > release ?
> > Is it that we run the test which you provided on 1.2 without fixed seeds
> > and for a longer duration without failures ?
> > Or is it that all unit tests should pass over a period of 2 days without
> > issues. This may require fixing all of the flaky tests which would delay
> > the release by considerable amount of time.
> > Or is it something else ?
> >
> > Anirudh
> >
> >
> > On Fri, May 4, 2018 at 4:49 AM, Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Could you remove the fixed seeds and run it for a couple of hours with
> an
> > > additional loop?  Also I would suggest running the unit tests over and
> > over
> > > for a couple of days if possible.
> > >
> > >
> > > Pedro.
> > >
> > > On Thu, May 3, 2018 at 8:33 PM, Anirudh  wrote:
> > >
> > > > Hi Pedro and Naveen,
> > > >
> > > > I am unable to reproduce this issue with MKLDNN on the master but not
> > on
> > > > the 1.2.RC2 branch.
> > > >
> > > > Did the following on 1.2.RC2 branch:
> > > >
> > > > make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_DIST_KVSTORE=0
> > > > USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1
> > > > export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
> > > > export MXNET_TEST_SEED=11
> > > > export MXNET_MODULE_SEED=812478194
> > > > export MXNET_TEST_COUNT=1
> > > > nosetests-2.7 -v tests/python/unittest/test_
> > > module.py:test_forward_reshape
> > > >
> > > > Was able to do the 10k runs successfully.
> > > >
> > > > Anirudh
> > > >
> > > > On Thu, May 3, 2018 at 8:46 AM, Anirudh 
> wrote:
> > > >
> > > > > Hi Pedro and Naveen,
> > > > >
> > > > > Is this issue reproducible when MXNet is built with USE_MKLDNN=0?
> > > > > Also, there are a bunch of MKLDNN fixes that didn't go into the
> > release
> > > > > branch. Is this issue reproducible on the release branch ?
> > > > > In my opinion, since we have marked MKLDNN as experimental feature
> > for
> > > > the
> > > > > release, if it is confirmed to be a MKLDNN issue
> > > > > we don't need to block the release on it.
> > > > >
> > > > > Anirudh
> > > > >
> > > > > On Thu, May 3, 2018 at 6:58 AM, Naveen Swamy 
> > > wrote:
> > > > >
> > > > >> Thanks for raising this issue Pedro.
> > > > >>
> > > > >> -1(binding)
> > > > >>
> > > > >> We were in a similar state for a while a year ago, a lot of effort
> > > went
> > > > to
> > > > >> stabilize the tests and the CI. I have seen the PR builds are
> > > > >> non-deterministic and you have to retry over and over (wasting
> > > resources
> > > > >> and time) and hope you get lucky.
> > > > >>
> > > > >> Look at the dashboard for master build
> > > > >> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> > mxnet/job/master/
> > > > >>
> > > > >> -Naveen
> > > > >>
> > > > >> On Thu, May 3, 2018 at 5:11 AM, Pedro Larroy <
> > > > >> pedro.larroy.li...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > -1  nondeterminisitc failures on CI master:
> > > > >> > https://issues.apache.org/jira/browse/MXNET-396
> > > > >> >
> > > > >> > Was able to reproduce once in a fresh p3 instance with DLAMI
> > can't
> > > > >> > reproduce consistently.
> > > > >> >
> > > > >> > On Wed, May 2, 2018 at 9:51 PM, Anirudh 
> > > > wrote:
> > > > >> >
> > > > >> > > Hi all,
> > > > >> > >
> > > > >> > > As part of RC2 release, we have addressed bugs and some
> concerns
> > > > that
> 

Re: [VOTE] Release Apache MXNet (incubating) version 1.2.0.RC0

2018-04-23 Thread Haibin Lin
Hi Da,

After looking at your detailed description in github issue
https://github.com/apache/incubator-mxnet/issues/10663, I would argue that
the fix should go to mxnet-onnx instead of mxnet-mkldnn.

In onnx padding params are in the form of (left, right, top, bottom), which
supports asymmetric padding. In MXNet only symmetric padding is supported
in the form of (height, width) tuple, which doesn't support asymmetric
padding.

If you remove the check in mkldnn, that means onnx-mxnet user can pass in
an asymmetric padding of (left, right, top, bottom) but mxnet would only
look at the first 2 numbers in the tuple and compute the wrong result
silently.

One way to fix it is to support (left, right, top, bottom) in mxnet padding
op (both with mkl-dnn and without mkl-dnn), which will take some time.
Another way is that mxnet-onnx checks the (left, right, top, bottom) tuple,
and pass the arguments in the form of (height, width) to mxnet ops if
symmetric, or throw an exception for asymmetric paddings.

I'm not an expert in onnx but I'm curious if this can be fixed in
mxnet-onnx quickly. Removing the correct check in mkldnn doesn't sound
reasonable...

Best,
Haibin

On Mon, Apr 23, 2018 at 5:12 PM, Zheng, Da  wrote:

> I think I have found the root of the problem.
>
> The tutorial loads a model from onnx, which uses padding (left, right,
> top, bottom). But mxnet uses padding (height, width). Currently, when an
> ONNX model is loaded to MXNet, the padding is converted correctly. MXNet
> conv doesn't check the number of elements in the padding and ignores the
> problem. However, mxnet-mkldnn checks it and fails.
>
> The correct way of fixing this issue is to check the number of elements in
> the padding tuple in mxnet conv. If the tuple size mismatches, it should
> fail. When an ONNX model is loaded to MXNet, the padding should be
> converted correctly.
>
> For the time being, I'll just fix MKLDNN so it doesn't check the tuple
> length of padding.
>
> Best,
> Da
>
> On 4/23/18, 2:58 PM, "Zheng, Da"  wrote:
>
> I can reproduce the bug now. I'm working on a fix for the bug.
>
> Currently, there are a few more bug fixes for MKLDNN.
> https://github.com/apache/incubator-mxnet/pull/10651
> https://github.com/apache/incubator-mxnet/pull/10624
> https://github.com/apache/incubator-mxnet/pull/10619/files
> https://github.com/apache/incubator-mxnet/pull/10616
> https://github.com/apache/incubator-mxnet/pull/10591/files
>
> They are ready for review.
>
> I just discussed with @Anirudh. Maybe we should say in the release
> note that MKLDNN in MXNet is still experimental.
> What do you think?
>
> Best,
> Da
>
> On 4/21/18, 7:59 PM, "Zheng, Da"  wrote:
>
> It seems I have problems of compiling scala when running "make
> docs". Please see the error below.
>
> Are there any instructions of compiling these scala code? I guess
> I might miss some packages.
> I tried installing libslf4j-java and didn't help.
>
> Best,
> Da
>
> Execute "cd /home/ubuntu/apache-mxnet-src-
> 1.2.0.rc0-incubating/docs/../scala-package; scaladoc `find . -type f
> -name "*.scala" | egrep "\/core|\/infer" | egrep -v "Suite"`; exit 0"
> ./examples/src/main/scala/org/apache/mxnetexamples/infer/
> objectdetector/SSDClassifierExample.scala:24: error: object kohsuke is
> not a member of package org
> import org.kohsuke.args4j.{CmdLineParser, Option}
>^
> ./examples/src/main/scala/org/apache/mxnetexamples/infer/
> objectdetector/SSDClassifierExample.scala:25: error: object slf4j is not
> a member of package org
> import org.slf4j.LoggerFactory
>^
> ./examples/src/main/scala/org/apache/mxnetexamples/infer/
> objectdetector/SSDClassifierExample.scala:41: error: class Option is
> abstract; cannot be instantiated
>   @Option(name = "--model-path-prefix", usage = "the input model
> directory and prefix of the model")
>^
> warning: no valid targets for annotation on value modelPathPrefix
> - it is discarded unused. You may specify targets with meta-annotations,
> e.g. @( @getter)
> ./examples/src/main/scala/org/apache/mxnetexamples/infer/
> objectdetector/SSDClassifierExample.scala:43: error: class Option is
> abstract; cannot be instantiated
>   @Option(name = "--input-image", usage = "the input image")
>^
> warning: no valid targets for annotation on value inputImagePath -
> it is discarded unused. You may specify targets with meta-annotations, e.g.
> @( @getter)
> ./examples/src/main/scala/org/apache/mxnetexamples/infer/
> objectdetector/SSDClassifierExample.scala:45: error: class Option is
> abstract; cannot be instantiated
>   @Option(name = "--input-dir", usage = "the input batch of images
> directory")
>^
> warning: no valid 

Sparse support for Gluon

2018-04-22 Thread Haibin Lin
Hi everyone,

I drafted a design proposal for supporting sparse ndarrays in Gluon. Please see 
https://cwiki.apache.org/confluence/display/MXNET/Gluon+Sparse+Support for 
details and give suggestions. Thanks!

Best,
Haibin 


Re: PR build failed because of git errors

2018-03-29 Thread Haibin Lin
I've seen this before. Try rebasing and force pushing.

On Thu, Mar 29, 2018 at 3:51 PM, Indhu  wrote:

> Hi,
>
> Looks like PR #10039 build failed because of git errors. Here is the error
> log:
> http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> mxnet/job/PR-10039/4/console.
> Does someone know what could be happening here?
>
> Build error:
>
> Adding as 3rdparty/dlpack~7c28089749287f42ea8f41abd1358e6dbac54187 instead
> Automatic merge failed; fix conflicts and then commit the result.
>
> stderr:
> at
> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommandIn(CliGitAPIImpl.java:1990)
> at
> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommandIn(CliGitAPIImpl.java:1958)
> at
> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.
> launchCommandIn(CliGitAPIImpl.java:1954)
> at
> org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommand(CliGitAPIImpl.
> java:1592)
> at
> org.jenkinsci.plugins.gitclient.CliGitAPIImpl$3.
> execute(CliGitAPIImpl.java:692)
> at
> jenkins.plugins.git.MergeWithGitSCMExtension.decorateRevisionToBuild(
> MergeWithGitSCMExtension.java:122)
> at hudson.plugins.git.GitSCM.determineRevisionToBuild(GitSCM.java:1068)
> at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1161)
> at
> org.jenkinsci.plugins.workflow.steps.scm.SCMStep.
> checkout(SCMStep.java:113)
> at
> org.jenkinsci.plugins.workflow.cps.CpsScmFlowDefinition.create(
> CpsScmFlowDefinition.java:130)
> at
> org.jenkinsci.plugins.workflow.multibranch.SCMBinder.create(SCMBinder.
> java:120)
> at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(
> WorkflowRun.java:263)
> at hudson.model.ResourceController.execute(ResourceController.java:97)
> at hudson.model.Executor.run(Executor.java:429)
> Finished: FAILURE
>
> Thanks,
> Indu
>


Release Status - MXNet 1.1.0.RC0

2018-02-08 Thread Haibin Lin
Hi all,

The vote on general@ for 1.1.0.RC0 failed with 2 -1 votes (2 bindings) and
no 0 or +1 votes.
-1 votes (binding) and reasons:
Justin and Henri - LICENSE has issues

Vote thread on general@:
https://lists.apache.org/thread.html/453663efa82e829e746ace79ece7b87209a856f7a7a6e3b2bfaf0d8f@%3Cgeneral.incubator.apache.org%3E

PR #9701 (from @mbaijal) which updated the LICENSE file was approved by
mentors and merged yesterday. Another release candidate with a updated
LICENSE file will be proposed soon.

Unfortunately, I will be traveling in China in the next three weeks. I'll
let Yizhi lead the remaining efforts for 1.1.0 release.

Thanks,
Haibin


Re: JIRA notifications on dev@

2018-02-06 Thread Haibin Lin
+1 to disable automatic notifications to dev@.

On Tue, Feb 6, 2018 at 5:05 PM, Marco de Abreu  wrote:

> Haha, we need to have a discussion here first before we can make the
> change. I'd like the opinion of the community on this one.
>
> -Marco
>
> On Tue, Feb 6, 2018 at 5:04 PM, Chris Olivier 
> wrote:
>
> > Please feel free to submit a ticket to Infra to get the emails disabled.
> >
> > Oh yeah...  If we do that, they send us nasty emails...
> >
> > Waiting for Sebastian to file a ticket (I pinged on Slack).
> >
> >
> > On Tue, Feb 6, 2018 at 5:00 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > while I highly appreciate the usage of JIRA within MXNet, I guess that
> > I'm
> > > not the only one bothered by the high number of JIRA notifications on
> > dev@
> > > .
> > > While everybody has the chance to create a filter for these message,
> I'm
> > > afraid that new developers and people viewing the archive are getting
> > > pretty frustrated as actual conversations are pushed to the side by all
> > the
> > > communication happening on the JIRA tickets (see [1]). Therefor, I'd
> > > propose one of the following solutions:
> > >
> > > 1. Disable JIRA notifications to dev@ entirely, leading to only the
> > people
> > > subscribed to a ticket being notified directly.
> > > 2. Create a separate email list for JIRA in the same way as [2].
> > >
> > > Best regards,
> > > Marco
> > >
> > > [1]: https://lists.apache.org/list.html?d...@mxnet.apache.org
> > > [2]: https://lists.apache.org/list.html?comm...@mxnet.apache.org
> > >
> >
>


[jira] [Created] (MXNET-16) Move submodules to the 3rdparty folder

2018-02-06 Thread Haibin Lin (JIRA)
Haibin Lin created MXNET-16:
---

 Summary: Move submodules to the 3rdparty folder
 Key: MXNET-16
 URL: https://issues.apache.org/jira/browse/MXNET-16
 Project: Apache MXNet
  Issue Type: Task
Reporter: Haibin Lin


MXNet depends on many submodules from [dmlc|https://github.com/dmlc/] including 
nnvm, ps-lite, mshadow, dlpack and dmlc-core. These submodules are not yet 
moved into the 3rdparty folder, which creates confusion when reviewing the 
licenses. 

We need to move all these submodules to the 3rdparty folder. Specifically, we 
need to change both cmake and make build scripts, for multiple build target 
(MXNet core, cpp-package, amalgamation, etc) and make sure it is tested in 
multiple environments.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [RESULTS] [VOTE] Release MXNet version 1.1.0.RC0

2018-02-06 Thread Haibin Lin
Hi Marco,

The voting thread on general@ is at
https://lists.apache.org/thread.html/453663efa82e829e746ace79ece7b87209a856f7a7a6e3b2bfaf0d8f@%3Cgeneral.incubator.apache.org%3E
and
unfortunately two mentors voted -1 regarding the updates in the LICENSE
file. Another release candidate will be required for 1.1.0.

Best,
Haibin

On Tue, Feb 6, 2018 at 1:22 PM, Marco de Abreu <marco.g.ab...@googlemail.com
> wrote:

> Hello,
>
> what's the status of this release?
>
> Best regards,
> Marco
>
> On Wed, Jan 31, 2018 at 1:02 PM, Haibin Lin <haibin.lin@gmail.com>
> wrote:
>
> > This vote passes with 6 +1 votes (5 bindings) and no 0 or -1 votes.
> >
> > Binding +1:
> > Yizhi
> > Sandeep
> > Chris
> > Marco
> > Indhu
> >
> >
> > Non-binding +1:
> > Kellen
> >
> > Vote thread:
> > https://lists.apache.org/thread.html/4b9310aaa1e5c378aa91c274acf412
> > eb5b495a10fe7dad0fab653436@%3Cdev.mxnet.apache.org%3E
> >
> > I'll continue with the release process on general@ and the release
> > announcement will follow in the next few days.
> >
> > Best,
> > Haibin
> >
>


Re: Could anyone help add me into slack group?

2018-02-05 Thread Haibin Lin
Done.

On Fri, Feb 2, 2018 at 7:15 PM, Zhao, Patric  wrote:

> Thanks,
>
> --Patric
>
>


Re: Adding to slack

2018-02-05 Thread Haibin Lin
Done.

On Mon, Feb 5, 2018 at 5:22 AM, Yogesh Kumar  wrote:

>
>
>
>
>
>
> Please add me to the Slack channel
>
>
> Get Outlook for iOS
>
>


Re: [jira] [Commented] (MXNET-2) Please delete old releases from mirroring system

2018-02-05 Thread Haibin Lin
I'm going to work on this issue now. Is anyone aware of any link that might
be broken after it is removed? If there're no major concerns I'll go ahead
and remove it.

Best,
Haibin

On Sun, Feb 4, 2018 at 12:56 PM, Sebb (JIRA)  wrote:

>
> [ https://issues.apache.org/jira/browse/MXNET-2?page=com.
> atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=16351887#comment-16351887 ]
>
> Sebb commented on MXNET-2:
> --
>
> Is there anybody there?
>
> > Please delete old releases from mirroring system
> > 
> >
> > Key: MXNET-2
> > URL: https://issues.apache.org/jira/browse/MXNET-2
> > Project: Apache MXNet
> >  Issue Type: Bug
> >Reporter: Sebb
> >Priority: Major
> >
> > To reduce the load on the ASF mirrors, projects are required to delete
> old releases [1]
> > Please can you remove all non-current releases?
> > It's unfair to expect the 3rd party mirrors to carry old releases.
> > Note that older releases can still be linked from the download page, but
> such links should use the archive server at:
> > https://archive.apache.org/dist/incubator/mxnet/
> > Thanks!
> > [1] http://www.apache.org/dev/release.html#when-to-archive
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


Re: Unit tests removed

2018-01-31 Thread Haibin Lin
Good catch.

In general, I agree that tests are not supposed to removed, although CI is
not running any of these cpp tests just yet. Usually unit tests in python
for individual operators should be sufficient to test the correctness of
operators (although I don't know how/if python tests can run on edge
devices like iOS/Android) and it's not feasible to duplicate all operator
python tests with their cpp counterparts. For functionalities that are not
exposed to python or memory checks, I do agree that cpp tests are necessary.

I am curious what are exactly tested in those cpp tests? I know that
BatchNorm ops are tested in python already, but I don't have a full picture
of all cpp tests and whether we want to re-implement them.

On the other hand, be aware that the Intel team have a few MKL-DNN related
projects depending on this PR. Reverting the PR will be a blocker for them.

Best,
Haibin

On Wed, Jan 31, 2018 at 3:30 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Here's a link:
> https://github.com/apache/incubator-mxnet/pull/8302#discussion_r165204667
>
> -Marco
>
> 2018-01-31 15:23 GMT-08:00 Marco de Abreu :
>
> > Hi Chris,
> >
> > considering the size of that PR, could you provide a direct link to the
> > changes you're addressing?
> >
> > -Marco
> >
> > 2018-01-31 13:39 GMT-08:00 Chris Olivier :
> >
> >> This PR was just merged that removed some 30 or so C++ unit tests for
> >> batch
> >> norm operator.
> >>
> >> https://github.com/apache/incubator-mxnet/pull/8302
> >>
> >> Is this ok?
> >>
> >
> >
>


[RESULTS] [VOTE] Release MXNet version 1.1.0.RC0

2018-01-31 Thread Haibin Lin
This vote passes with 6 +1 votes (5 bindings) and no 0 or -1 votes.

Binding +1:
Yizhi
Sandeep
Chris
Marco
Indhu


Non-binding +1:
Kellen

Vote thread:
https://lists.apache.org/thread.html/4b9310aaa1e5c378aa91c274acf412eb5b495a10fe7dad0fab653436@%3Cdev.mxnet.apache.org%3E

I'll continue with the release process on general@ and the release
announcement will follow in the next few days.

Best,
Haibin


Re: [VOTE] Release Apache MXNet (incubating) version 1.1.0.RC0

2018-01-30 Thread Haibin Lin
@Sandeep, I've updated the release assets
with apache-mxnet-src-1.1.0.rc0-incubating.tar.gz which contains submodule
source code. Please use the apache one to build. Thanks.

Best,
Haibin

On Tue, Jan 30, 2018 at 8:47 PM, Chris Olivier <cjolivie...@gmail.com>
wrote:

> I guess that was an oversight. rc2 has the full source.
> I remember it generally has the source because there were complaints that
> there were licensing issues with the dmlc files.
>
> On Tue, Jan 30, 2018 at 8:38 PM, Haibin Lin <haibin.lin@gmail.com>
> wrote:
>
> > @Chris Are pre-releases expected to include submodules? I checked
> 1.0.0.rc0
> > and 1.0.0.rc1, neither of them contain source code of submodules.
> >
> > On Tue, Jan 30, 2018 at 8:32 PM, Chris Olivier <cjolivie...@gmail.com>
> > wrote:
> >
> > > All other relases have the submodule sources included
> > >
> > > On Tue, Jan 30, 2018 at 8:00 PM, sandeep krishnamurthy <
> > > sandeep.krishn...@gmail.com> wrote:
> > >
> > > > I downloaded source from -
> > > > https://github.com/apache/incubator-mxnet/archive/1.1.0.rc0.tar.gz
> and
> > > > tried to build from source.
> > > > Build failed as submodules (ps-list, mshadow) are empty directory in
> > the
> > > > source tar.
> > > >
> > > > Is this expected to not include submodules in source tar?
> > > >
> > > >
> > > > On Tue, Jan 30, 2018 at 4:15 PM, Marco de Abreu <
> > > > marco.g.ab...@googlemail.com> wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Tue, Jan 30, 2018 at 3:20 PM, Indhu <indhubhara...@gmail.com>
> > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 30, 2018 at 2:16 PM Chris Olivier <
> > cjolivie...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > On Sun, Jan 28, 2018 at 12:35 AM, Haibin Lin <
> > > > haibin.lin@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Update:
> > > > > > > >
> > > > > > > > Link to release candidate 1.1.0.rc0:
> > > > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.1.
> > > 0.rc0/
> > > > > > > >
> > > > > > > > Link to the release tag:
> > > > > > > > https://github.com/apache/incubator-mxnet/tree/1.1.0.rc0
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Haibin
> > > > > > > >
> > > > > > > > On Sun, Jan 28, 2018 at 12:29 AM, Haibin Lin <
> > > > > haibin.lin@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > Given that most people on dev@ are in favor of a minor
> > release
> > > > > > (1.1.0)
> > > > > > > > > instead of a patch release due to API changes, I'd like to
> > > > propose
> > > > > a
> > > > > > > vote
> > > > > > > > > to release Apache MXNet (incubating) 1.1.0. Voting will
> start
> > > now
> > > > > > > > (Sunday,
> > > > > > > > > January 28th) and end at 1pm Wednesday, January 31th PST.
> > > > > > > > >
> > > > > > > > > Link to release notes:
> > > > > > > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > > > > > > Apache+MXNet+%28incubating%29+1.1.0+Release+Notes
> > > > > > > > >
> > > > > > > > > Link to release candidate 1.1.0.rc0:
> > > > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/
> > > 1.1.0.rc0
> > > > > > > > >
> > > > > > > > > View this page and scroll down to “Build from Source” with
> > > source
> > > > > > code
> > > > > > > > > obtained from the 1.1.0.rc0 tag:
> > > > > > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > > > >
> > > > > > > > > (Note: The README.md points to the 1.1.0 tag and does not
> > work
> > > at
> > > > > the
> > > > > > > > > moment.)
> > > > > > > > >
> > > > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > > > +1 = approve
> > > > > > > > > +0 = no opinion
> > > > > > > > > -1 = disapprove (provide reason)
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Haibin
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sandeep Krishnamurthy
> > > >
> > >
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.1.0.RC0

2018-01-30 Thread Haibin Lin
@Chris Are pre-releases expected to include submodules? I checked 1.0.0.rc0
and 1.0.0.rc1, neither of them contain source code of submodules.

On Tue, Jan 30, 2018 at 8:32 PM, Chris Olivier <cjolivie...@gmail.com>
wrote:

> All other relases have the submodule sources included
>
> On Tue, Jan 30, 2018 at 8:00 PM, sandeep krishnamurthy <
> sandeep.krishn...@gmail.com> wrote:
>
> > I downloaded source from -
> > https://github.com/apache/incubator-mxnet/archive/1.1.0.rc0.tar.gz and
> > tried to build from source.
> > Build failed as submodules (ps-list, mshadow) are empty directory in the
> > source tar.
> >
> > Is this expected to not include submodules in source tar?
> >
> >
> > On Tue, Jan 30, 2018 at 4:15 PM, Marco de Abreu <
> > marco.g.ab...@googlemail.com> wrote:
> >
> > > +1 (binding)
> > >
> > > On Tue, Jan 30, 2018 at 3:20 PM, Indhu <indhubhara...@gmail.com>
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > >
> > > > On Tue, Jan 30, 2018 at 2:16 PM Chris Olivier <cjolivie...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Sun, Jan 28, 2018 at 12:35 AM, Haibin Lin <
> > haibin.lin@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Update:
> > > > > >
> > > > > > Link to release candidate 1.1.0.rc0:
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.1.
> 0.rc0/
> > > > > >
> > > > > > Link to the release tag:
> > > > > > https://github.com/apache/incubator-mxnet/tree/1.1.0.rc0
> > > > > >
> > > > > > Best,
> > > > > > Haibin
> > > > > >
> > > > > > On Sun, Jan 28, 2018 at 12:29 AM, Haibin Lin <
> > > haibin.lin@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > Given that most people on dev@ are in favor of a minor release
> > > > (1.1.0)
> > > > > > > instead of a patch release due to API changes, I'd like to
> > propose
> > > a
> > > > > vote
> > > > > > > to release Apache MXNet (incubating) 1.1.0. Voting will start
> now
> > > > > > (Sunday,
> > > > > > > January 28th) and end at 1pm Wednesday, January 31th PST.
> > > > > > >
> > > > > > > Link to release notes:
> > > > > > > https://cwiki.apache.org/confluence/display/MXNET/
> > > > > > > Apache+MXNet+%28incubating%29+1.1.0+Release+Notes
> > > > > > >
> > > > > > > Link to release candidate 1.1.0.rc0:
> > > > > > > https://github.com/apache/incubator-mxnet/releases/tag/
> 1.1.0.rc0
> > > > > > >
> > > > > > > View this page and scroll down to “Build from Source” with
> source
> > > > code
> > > > > > > obtained from the 1.1.0.rc0 tag:
> > > > > > > https://mxnet.incubator.apache.org/install/index.html
> > > > > > >
> > > > > > > (Note: The README.md points to the 1.1.0 tag and does not work
> at
> > > the
> > > > > > > moment.)
> > > > > > >
> > > > > > > Please remember to TEST first before voting accordingly:
> > > > > > > +1 = approve
> > > > > > > +0 = no opinion
> > > > > > > -1 = disapprove (provide reason)
> > > > > > >
> > > > > > > Best,
> > > > > > > Haibin
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Sandeep Krishnamurthy
> >
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.1.0.RC0

2018-01-28 Thread Haibin Lin
Update:

Link to release candidate 1.1.0.rc0:
https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.1.0.rc0/

Link to the release tag:
https://github.com/apache/incubator-mxnet/tree/1.1.0.rc0

Best,
Haibin

On Sun, Jan 28, 2018 at 12:29 AM, Haibin Lin <haibin.lin@gmail.com>
wrote:

> Hi everyone,
>
> Given that most people on dev@ are in favor of a minor release (1.1.0)
> instead of a patch release due to API changes, I'd like to propose a vote
> to release Apache MXNet (incubating) 1.1.0. Voting will start now (Sunday,
> January 28th) and end at 1pm Wednesday, January 31th PST.
>
> Link to release notes:
> https://cwiki.apache.org/confluence/display/MXNET/
> Apache+MXNet+%28incubating%29+1.1.0+Release+Notes
>
> Link to release candidate 1.1.0.rc0:
> https://github.com/apache/incubator-mxnet/releases/tag/1.1.0.rc0
>
> View this page and scroll down to “Build from Source” with source code
> obtained from the 1.1.0.rc0 tag:
> https://mxnet.incubator.apache.org/install/index.html
>
> (Note: The README.md points to the 1.1.0 tag and does not work at the
> moment.)
>
> Please remember to TEST first before voting accordingly:
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
>
> Best,
> Haibin
>
>


Re: Release plan - MXNET 1.0.1

2018-01-27 Thread Haibin Lin
Thanks Marco. I reviewed them and added many to the release note. See
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.1.0+Release+Notes



On Sat, Jan 27, 2018 at 12:20 AM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> We might want to have a look at the following commits and add some of them
> to the release notes:
> https://github.com/apache/incubator-mxnet/commit/
> f960522fce032497ced6979ce7654f1f549d0434
> https://github.com/apache/incubator-mxnet/commit/
> c74cf1b3e3be8cfab7f92f646c9ac46ebe2ff6f8
> https://github.com/apache/incubator-mxnet/commit/
> 262c74c2c8228bfc60e0673bdf5ead0bf2e1973d
> https://github.com/apache/incubator-mxnet/commit/
> 3ac5376cbe14faa120d382be62d32c9c49a0baa0
> https://github.com/apache/incubator-mxnet/commit/
> 5251b861693402a3e6394da990a5af183b6f7247
> https://github.com/apache/incubator-mxnet/commit/
> 4600070cd35bf4f1f3b93f4ce349c8e34e3610ae
> https://github.com/apache/incubator-mxnet/commit/
> a80245d4d1a3005593456f743a34e396b774edbc
> https://github.com/apache/incubator-mxnet/commit/
> 12cb0d20c7feb0ba1aa6fa6dd1208af8f2fb230c
> https://github.com/apache/incubator-mxnet/commit/
> ddec3cc1ad4059016ca0a88e7f1b30bf48619d3a
> https://github.com/apache/incubator-mxnet/commit/
> 09ed385e5059ffea2671e7d8a24a390cee7c1f8a
> https://github.com/apache/incubator-mxnet/commit/
> c4a76da62fbc5e3e7272fd89294fba6b22868bba
> https://github.com/apache/incubator-mxnet/commit/
> 167871a135308971c22cb8f6bdc2c8e7477fda6e
> https://github.com/apache/incubator-mxnet/commit/
> b909769be11237e808c18e8afff55e7ab8877be9
> https://github.com/apache/incubator-mxnet/commit/
> d7da05b61adc9e4aba3e9995809b0d06965ae3bb
> https://github.com/apache/incubator-mxnet/commit/
> fdc0766971ed95811d0db15ad0d878998192fce5
>
> -Marco
>
> On Fri, Jan 26, 2018 at 10:28 PM, Haibin Lin <haibin.lin@gmail.com>
> wrote:
>
> > Hi everyone,
> >
> > Just some status update:
> >
> > 1. The two tests reported in #9553 are both fixed by @anirudh2290.
> >
> > 2. Many broken website links are fixed in #9575 by @thinksanky.
> >
> > 3. I made some updates to the release note. Please help edit the note and
> > so that it reflects all changes in MXNet including new functionalities in
> > the contrib namespace.
> >
> >
> > Best,
> > Haibin
> >
> > On Thu, Jan 25, 2018 at 6:26 PM, Haibin Lin <haibin.lin@gmail.com>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Just some status update regarding the release:
> > >
> > > 1. More license fixes by @mbaijal are merged.
> > > https://github.com/apache/incubator-mxnet/pull/9554 for the perl
> package
> > > https://github.com/apache/incubator-mxnet/pull/9556 for the docs
> folder
> > > https://github.com/apache/incubator-mxnet/pull/9559 for the ci_build
> > > folder
> > >
> > > 2. Two tests failed intermittently, reported by @marcoabreu in another
> > > thread. We(@anirudh2290 and I) are looking into them.
> > > - test_operator_gpu.test_correlation
> > > <http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> > incubator-mxnet/detail/PR-9560/1/pipeline/>
> > > There were no changes for the test and implementation of this operator
> > for
> > > the past two months. Is anyone using this operator? We are currently
> > trying
> > > to reproduce the error.
> > > - test_forward.test_consistency
> > > <http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> > incubator-mxnet/detail/PR-9522/1/pipeline/>
> > > It looks like the downloaded numpy file is truncated. We'll make a PR
> to
> > > update the test.
> > >
> > > 3. More API changes since 1.0.0 reported by @szha
> > > - https://github.com/apache/incubator-mxnet/pull/9040
> > > <https://github.com/apache/incubator-mxnet/pull/9040>
> > > - https://github.com/apache/incubator-mxnet/pull/9420
> > > <https://github.com/apache/incubator-mxnet/pull/9420>
> > > - https://github.com/apache/incubator-mxnet/pull/9306
> > >
> > > Best,
> > > Haibin
> > >
> > > On Thu, Jan 25, 2018 at 8:08 AM, Steffen Rochel <
> steffenroc...@gmail.com
> > >
> > > wrote:
> > >
> > >> I support the proposal from Nan - this is a practical and productive
> > way.
> > >> I
> > >> include Nan's description into
> > >> https://cwiki.apache.org/confluence/display/MXNET/Release+Ve
> > >> rsioning+and+Branching
> >

Re: Release plan - MXNET 1.0.1

2018-01-26 Thread Haibin Lin
Hi everyone,

Just some status update:

1. The two tests reported in #9553 are both fixed by @anirudh2290.

2. Many broken website links are fixed in #9575 by @thinksanky.

3. I made some updates to the release note. Please help edit the note and
so that it reflects all changes in MXNet including new functionalities in
the contrib namespace.


Best,
Haibin

On Thu, Jan 25, 2018 at 6:26 PM, Haibin Lin <haibin.lin@gmail.com>
wrote:

> Hi everyone,
>
> Just some status update regarding the release:
>
> 1. More license fixes by @mbaijal are merged.
> https://github.com/apache/incubator-mxnet/pull/9554 for the perl package
> https://github.com/apache/incubator-mxnet/pull/9556 for the docs folder
> https://github.com/apache/incubator-mxnet/pull/9559 for the ci_build
> folder
>
> 2. Two tests failed intermittently, reported by @marcoabreu in another
> thread. We(@anirudh2290 and I) are looking into them.
> - test_operator_gpu.test_correlation
> <http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9560/1/pipeline/>
> There were no changes for the test and implementation of this operator for
> the past two months. Is anyone using this operator? We are currently trying
> to reproduce the error.
> - test_forward.test_consistency
> <http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-9522/1/pipeline/>
> It looks like the downloaded numpy file is truncated. We'll make a PR to
> update the test.
>
> 3. More API changes since 1.0.0 reported by @szha
> - https://github.com/apache/incubator-mxnet/pull/9040
> <https://github.com/apache/incubator-mxnet/pull/9040>
> - https://github.com/apache/incubator-mxnet/pull/9420
> <https://github.com/apache/incubator-mxnet/pull/9420>
> - https://github.com/apache/incubator-mxnet/pull/9306
>
> Best,
> Haibin
>
> On Thu, Jan 25, 2018 at 8:08 AM, Steffen Rochel <steffenroc...@gmail.com>
> wrote:
>
>> I support the proposal from Nan - this is a practical and productive way.
>> I
>> include Nan's description into
>> https://cwiki.apache.org/confluence/display/MXNET/Release+Ve
>> rsioning+and+Branching
>>
>>
>> We should make API changes very carefully and need to depend on the
>> community to flag any changes until we have better test coverage.
>>
>> Steffen
>>
>> On Thu, Jan 25, 2018 at 7:30 AM kellen sunderland <
>> kellen.sunderl...@gmail.com> wrote:
>>
>> > @Marco: Ok, well then it sounds like a good time for the community to
>> > re-think how we look for API changes ;-).  We can continue the chat in
>> the
>> > semver proposal, but maybe we could create a collection of APIs we
>> consider
>> > to be semver'd and review those interfaces each release.  Spending
>> reviewer
>> > and contributor time each PR on something that we ultimately ignore
>> doesn't
>> > seem productive.
>> >
>> > On Thu, Jan 25, 2018 at 4:29 PM, kellen sunderland <
>> > kellen.sunderl...@gmail.com> wrote:
>> >
>> > > Yes this is also what I'd suggest Nan, sorry if I wasn't clear.  My
>> > > comment was referring to 2.  So as an example for our current release
>> we
>> > > could cut a new minor release including new features such as the text
>> > api,
>> > > scala rename, but we could cherry-pick the important bug fixes and
>> apply
>> > > them to the 1.0.x branch.
>> > >
>> > > On Thu, Jan 25, 2018 at 4:22 PM, Nan Zhu <zhunanmcg...@gmail.com>
>> wrote:
>> > >
>> > >> regarding "I'd agree that we should apply most of the fixes to the
>> > >> previous
>> > >> release branch and build from there."
>> > >>
>> > >> my suggestion is actually a bit different with this, my idea is that
>> > >>
>> > >> 1. we always work with master branch, and when there is a date for
>> > >> releasing a new version (minor or major) one, we cut a new branch and
>> > >> announce code freeze for that version (of course, there is some
>> > exception
>> > >> to merge the previously-ignored blockers to the newly cut branch)
>> > >>
>> > >> 2. after the release, we still work in master for the next (at least
>> > minor
>> > >> version) and cautiously backport the patches to the last cut branch
>> > >> (assuming that we always maintain only one previous minor version)
>> > >>
>> > >> with this model, what we would do n

Re: Release plan - MXNET 1.0.1

2018-01-25 Thread Haibin Lin
t;> > to release this as a patch release.  The reason being that
> > organizations
> > >> > and users will know that they can apply this release without making
> > >> major
> > >> > changes to their dependencies.  It also helps set expectations
> around
> > >> the
> > >> > degree of regression testing you'd expect to do on a release
> > (typically
> > >> > patch releases would require less testing).  For that reason if we
> > >> release
> > >> > as a patch release I think we could expect better adoption rates in
> > the
> > >> > community and within large tech orgs.  If we release as a minor
> > release
> > >> we
> > >> > should expect that many customers may take a long time to update,
> and
> > >> as a
> > >> > community we will be forced to provide support for bugs which have
> > >> already
> > >> > been fixed.
> > >> >
> > >> > +1 (non-binding) In terms of branching I'd agree that we should
> apply
> > >> most
> > >> > of the fixes to the previous release branch and build from there.
> > >> Happy to
> > >> > help with this if needed.
> > >> >
> > >> > On Thu, Jan 25, 2018 at 6:19 AM, Nan Zhu <zhunanmcg...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > +1 and suggest consolidating all maintenance releases under the
> same
> > >> > > major.minor version into a single branch
> > >> > >
> > >> > > On Wed, Jan 24, 2018 at 9:06 PM, Meghna Baijal <
> > >> > meghnabaijal2...@gmail.com
> > >> > > >
> > >> > > wrote:
> > >> > >
> > >> > > > I agree. If the release candidate is being cut from the master
> > >> branch,
> > >> > it
> > >> > > > should be considered a minor release.
> > >> > > >
> > >> > > > Anyway the effort involved in the release process is exactly the
> > >> same
> > >> > in
> > >> > > > either case.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Meghna
> > >> > > >
> > >> > > > On Jan 24, 2018 8:56 PM, "Marco de Abreu" <
> > >> > marco.g.ab...@googlemail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Are there any particular reasons why we are classifying this
> > >> release
> > >> > as
> > >> > > > > patch instead of minor release? As far as I know, we don't
> have
> > >> any
> > >> > > tests
> > >> > > > > in place to determine API changes and thus can't guarantee
> that
> > >> this
> > >> > is
> > >> > > > an
> > >> > > > > actual patch release. Considering the fact that PRs have been
> > >> merged
> > >> > > > > without having semantic versioning in place, this could be
> quite
> > >> > risky.
> > >> > > > >
> > >> > > > > Instead, I'd rather propose to make a minor release 1.1
> instead
> > of
> > >> > > patch
> > >> > > > > release 1.0.1.
> > >> > > > >
> > >> > > > > -Marco
> > >> > > > >
> > >> > > > > Am 24.01.2018 7:20 nachm. schrieb "Zha, Sheng" <
> > >> zhash...@amazon.com
> > >> > >:
> > >> > > > >
> > >> > > > > > There’s an experimental API for text data indexing and
> > >> embedding in
> > >> > > > > > mx.contrib.text.
> > >> > > > > >
> > >> > > > > > - Sent by my thumb
> > >> > > > > >
> > >> > > > > > > On Jan 24, 2018, at 7:08 PM, Chris Olivier <
> > >> > cjolivie...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > the profiling PR contains a small breaking change, but i
> > don’t
> > >> > > think
> > >> > > &g

Re: Release plan - MXNET 1.0.1

2018-01-24 Thread Haibin Lin
Hi everyone,

Since the plan was to cut a branch from the master branch, the code will
include changes other than the bug fix PRs noted in the release note. Is
anyone aware of any API changes in the current MXNet master branch? In
particular, are there backward incompatible ones?

Best,
Haibin

On Tue, Jan 23, 2018 at 11:25 AM, Haibin Lin <haibin.lin@gmail.com>
wrote:

> Hi Sheng,
>
> 1. I've been following the discussion on the branching & versioning
> thread. Features like MKLDNN integration should not go to patch release
> 1.0.1, and it's risky to merge large PRs right before the release. I've
> removed the MKLDNN section from the release note. https://cwiki.apache.
> org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+
> 1.0.1+Release+Notes
>
> 2. I agree that we should aim for better test coverage & stable CI, and
> get those disabled/flaky tests fixed eventually. Fixing these requires
> efforts from the community and I strongly encourage contributors to help.
> Removing the corresponding feature from the release doesn't sound practical
> since users might be already using some of those. I suggest that we keep
> track of these tests on Apache Wiki and make sure they are addressed for
> the release after 1.0.1.
>
> Hi everyone,
>
> In terms of the current status for this release, all critical bug fixes
> are merged (to the best of my knowledge) and we have made good progress
> fixing license issues. As Meghna mentioned, a list of open questions
> regarding license is at https://cwiki.apache.org/confluence/display/MXNET/
> MXNet+Source+Licenses section D - it would be great if we can get more
> clarification/help/feedback from Apache mentors.
>
> I suggest that we shoot for code freeze for 1.0.1 rc0 this Wednesday. Does
> anyone have concern or objection on this?
>
> Best,
> Haibin
>
> On Tue, Jan 23, 2018 at 7:51 AM, Steffen Rochel <steffenroc...@gmail.com>
> wrote:
>
>> Hi Sheng -
>> 1. branch usage and versioning - lets converge our discussion and document
>> the agreement on wiki. I started a draft summarizing my understanding of
>> the proposal at
>> https://cwiki.apache.org/confluence/display/MXNET/Release+
>> Versioning+and+Branching.
>> Lets work together to refine and clarify the draft, so we have clarity
>> going forward. I'm inviting everyone to contribute to this discussion.
>> As MKLDNN integration is not ready yet and we want to release all the good
>> improvements including updates in tutorials and documentation I suggest we
>> move forward with the release asap. As we don't have major features or
>> non-compatible API changes (to best of my knowledge) I think it is
>> appropriate to label the release as 1.0.1.
>> Note: This label indicates a patch release. Patch releases should be
>> created from the related release branch. As we didn't plan for it and to
>> minimize overhead I suggest we make a one time exception to cut the 1.0.1
>> release from master branch and clearly communicate in release notes. Going
>> forward we should follow the methodology for versioning and branching to
>> whatever we agree on.
>> 2. Disabled tests: I agree with your concerns that we had to disable 13
>> tests due to non-deterministic behavior (see issues
>> <https://github.com/apache/incubator-mxnet/labels/Flaky>). I'm calling on
>> all contributors to help to resolve the non-deterministic behavior, so we
>> can improve our test coverage. As we discussed offline, lets tests
>> manually
>> short term, document the known issue in the release notes and prioritize
>> efforts post 1.0.1 release.
>>
>> Regards,
>> Steffen
>>
>> On Wed, Jan 17, 2018 at 5:05 PM Sheng Zha <zhash...@apache.org> wrote:
>>
>> > Hi Haibin,
>> >
>> > Thanks for leading this. I suggest that we hold onto this release until
>> we
>> > have clarity on the following items.
>> >
>> > 1. branch usage and versioning
>> > Given that we are past 1.0 and we're changing APIs, I'd like to suggest
>> > that we first agree on how
>> > versioning works in mxnet. If we follow semantic versioning, it would
>> > suggest that features like
>> > MKL-DNN should go at least into 1.1 (minor version change) instead of
>> > 1.0.1 (patch release).
>> > Also, assuming that new release will come from a new forked branch, I
>> > suggest that we clarify on how to
>> > name the branches too.
>> > You can find relevant thread at
>> > https://lists.apache.org/thread.html/c52f8353f63c1e63b2646ec
>> 3b08d9a8180a1381787d777b41b8ac69f@%3Cdev.mxnet.apache.org%3E
>>

Re: Submitting changes to the MXNet website

2018-01-23 Thread Haibin Lin
Hi Marco,

Thanks for the clarification. I was wondering about the website build
process, too. I wonder if it will be beneficial to create a page on Apache
wiki so that people not on the mailing list will be also aware of this?

Best,
Haibin

On Tue, Jan 23, 2018 at 1:08 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Hello,
>
> I have received a few questions regarding the publish and change process of
> the website at http://mxnet.incubator.apache.org/. Basically the process
> is
> as follows:
>
> Every night, the job at [1] downloads the current state of the master
> repository and executes the script located at [2]. The output is then
> picked up by the job at [3] and pushed to our repository at [4], which is
> mirrored to an ASF-repository and automatically published to the website.
> But there is one important catch during this publish-step:
>
> Every time the results get pushed to this website-repository, a 'rm -rf *;
> git rm -r *" is being executed, leading to all manual changes on the
> repository at [4] being overridden. This behaviour was copied from the old
> CI hosted under Apache Infra. I can only guess, but my bet for this design
> decision was probably to ensure consistency between the website and the
> docs in the MXNet main repository.
>
> So what does this mean for the community? In fact, PRs to the repository at
> [4] have no effect. Instead, please create a PR at [5] in order to get the
> changes actually published to the website.
>
> @Committers: Please decline any PRs at [4] and ask contributors to submit
> them to [5].
>
> Best regards,
> Marco
>
> [1]: http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet-build-site/
> [2]:
> https://github.com/apache/incubator-mxnet/blob/master/
> docs/build_version_doc/build_doc.sh
> [3]: http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-
> mxnet-publish-site/
> [4]: https://github.com/apache/incubator-mxnet-site
> [5]: https://github.com/apache/incubator-mxnet
>


Re: Release plan - MXNET 1.0.1

2018-01-23 Thread Haibin Lin
Hi Sheng,

1. I've been following the discussion on the branching & versioning thread.
Features like MKLDNN integration should not go to patch release 1.0.1, and
it's risky to merge large PRs right before the release. I've removed the
MKLDNN section from the release note.
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.0.1+Release+Notes


2. I agree that we should aim for better test coverage & stable CI, and get
those disabled/flaky tests fixed eventually. Fixing these requires efforts
from the community and I strongly encourage contributors to help. Removing
the corresponding feature from the release doesn't sound practical since
users might be already using some of those. I suggest that we keep track of
these tests on Apache Wiki and make sure they are addressed for the release
after 1.0.1.

Hi everyone,

In terms of the current status for this release, all critical bug fixes are
merged (to the best of my knowledge) and we have made good progress fixing
license issues. As Meghna mentioned, a list of open questions regarding
license is at
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Source+Licenses section
D - it would be great if we can get more clarification/help/feedback from
Apache mentors.

I suggest that we shoot for code freeze for 1.0.1 rc0 this Wednesday. Does
anyone have concern or objection on this?

Best,
Haibin

On Tue, Jan 23, 2018 at 7:51 AM, Steffen Rochel <steffenroc...@gmail.com>
wrote:

> Hi Sheng -
> 1. branch usage and versioning - lets converge our discussion and document
> the agreement on wiki. I started a draft summarizing my understanding of
> the proposal at
> https://cwiki.apache.org/confluence/display/MXNET/Release+Versioning+and+
> Branching.
> Lets work together to refine and clarify the draft, so we have clarity
> going forward. I'm inviting everyone to contribute to this discussion.
> As MKLDNN integration is not ready yet and we want to release all the good
> improvements including updates in tutorials and documentation I suggest we
> move forward with the release asap. As we don't have major features or
> non-compatible API changes (to best of my knowledge) I think it is
> appropriate to label the release as 1.0.1.
> Note: This label indicates a patch release. Patch releases should be
> created from the related release branch. As we didn't plan for it and to
> minimize overhead I suggest we make a one time exception to cut the 1.0.1
> release from master branch and clearly communicate in release notes. Going
> forward we should follow the methodology for versioning and branching to
> whatever we agree on.
> 2. Disabled tests: I agree with your concerns that we had to disable 13
> tests due to non-deterministic behavior (see issues
> <https://github.com/apache/incubator-mxnet/labels/Flaky>). I'm calling on
> all contributors to help to resolve the non-deterministic behavior, so we
> can improve our test coverage. As we discussed offline, lets tests manually
> short term, document the known issue in the release notes and prioritize
> efforts post 1.0.1 release.
>
> Regards,
> Steffen
>
> On Wed, Jan 17, 2018 at 5:05 PM Sheng Zha <zhash...@apache.org> wrote:
>
> > Hi Haibin,
> >
> > Thanks for leading this. I suggest that we hold onto this release until
> we
> > have clarity on the following items.
> >
> > 1. branch usage and versioning
> > Given that we are past 1.0 and we're changing APIs, I'd like to suggest
> > that we first agree on how
> > versioning works in mxnet. If we follow semantic versioning, it would
> > suggest that features like
> > MKL-DNN should go at least into 1.1 (minor version change) instead of
> > 1.0.1 (patch release).
> > Also, assuming that new release will come from a new forked branch, I
> > suggest that we clarify on how to
> > name the branches too.
> > You can find relevant thread at
> > https://lists.apache.org/thread.html/c52f8353f63c1e63b2646ec3b08d9a
> 8180a1381787d777b41b8ac69f@%3Cdev.mxnet.apache.org%3E
> >
> > 2. disabled tests
> > For the purpose of stabilizing test automation system, many tests were
> > disabled. In order to avoid
> > releasing untested features, we should mitigate the situation of having
> > disabled tests.
> > That means we can fix the tests before the release, or remove the
> > corresponding feature from release
> > (might be hard to do, e.g. for optimizer). Otherwise, we must
> collectively
> > decide that a feature is
> > OK to release without tests.
> > The thread on this topic can be found at
> > https://lists.apache.org/thread.html/addab1937bfcf09b5dfa15c1149ddc
> ebd084f1c4bf4e10a73770fb35@%3Cdev.mxnet.apache.org%3E
> >
> > We can proceed o

Please help update/review pending PRs

2018-01-22 Thread Haibin Lin
Hi everyone,

We still have a long list of outstanding PRs on GitHub. Contributors who
created the PR, could you all check your PR's status and resolve review
comments if any? Can everyone help review the pending PRs?

Best,
Haibin

=== list of open PRs on github =

zhanghang1989   7938
anirudh2290 8000 8738 9373
fhieber 8027
solin3198107 8423
yajiedesign 8114
kli-nlpr8218
benqua  8245
kevinthesun 8254
zheng-da8302
szha8377 9514
crazy-cat   8437
Prasad9 8484
DickJC123   8526
azai91  8527
munkim  8547
larroy  8578 8872 9035 9457
zhreshold   8582 8639
eftiquar8619
ptrendx 8652
xinghedyc   8797
wonghang8845
weixingzhang8849
joeddav 8912
Laurawly8915
ashokei 8918
eric-haibin-lin 8922 9481
taliesinb   8949
cjolivier01 8972 9498
rahul0039029 9049 9152
mbaijal 9046 9484 9500 9504 9505
anjishnu9111
jegalgo 9142
chaoyuaw9165
zihaolucky  9195
apache  9202 9485 9522
harshit98   9263
nehaljwani  9273
juliusshufan9305
katrinleinweber 9318
HectorSVC   9369
CodingCat   9389
tornadomeet 9420
pracheer9460
hubenjm 9470
chinakook   9492
asmushetzel 9495
lvaleriu9496
opringle9512
indhub  9519


Re: Proposal for treating warnings as errors in Linux & Clang builds (-Werror)

2018-01-15 Thread Haibin Lin
+1 (binding)

On Mon, Jan 15, 2018 at 9:43 AM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> +1
>
> On Mon, Jan 15, 2018 at 6:27 PM, Pedro Larroy <
> pedro.larroy.li...@gmail.com>
> wrote:
>
> > Hi
> >
> > I would like to propose to compile in CI with warnings as errors for
> > increased code quality. This has a dual purpose:
> >
> > 1. Enforce a clean compilation output. Warnings often indicate
> > deficiencies in the code and hide new warnings which can be an
> > indicator of problems.
> >
> > 2. Warnings can surface bugs as has happened before.
> >
> > While this might be impractical in all architectures, I would propose
> > having the Linux and Clang build run without warnings in CI.
> >
> > I think we are very close to this as I personally have been fixing
> > warnings in Linux and OSX / Clang.
> >
> > References:
> >
> > https://github.com/apache/incubator-mxnet/pull/9398
> >
> > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/
> > incubator-mxnet/detail/PR-9398/1/pipeline
> >
> > Pedro.
> >
>


Re: Release plan - MXNET 1.0.1

2018-01-12 Thread Haibin Lin
Hi Asmus,
 
I do understand where the concern comes from. PR 8302 is indeed a large PR that 
changed a lot of code and still requires more efforts for review and fully test 
the changes.
 
To provide some context, PR 8302 is a refactoring and improvement of PR 7931 
based on review comments in 7931 so that the MKL integration is more modular 
and maintainable. Da (the main contributor of the PR) posted a design doc on 
Apache wiki to help people understand the design of the integration 
(https://cwiki.apache.org/confluence/display/MXNET/The+design+of+MKLDNN+integration).
 
As there're still some final issues (code review and test coverage) to address 
for this PR. In my opinion, we should try our best to address these issues in 
the next few days, re-evaluate whether the integration is in a decent state, 
and make a decision on whether to include it in the 1.0.1 release then.
 
In the meanwhile, I would like to strongly encourage to help to review the 
outstanding PRs and reduce the backlog.
 
Best,
Haibin

On 2018-01-11 06:41, Asmus Hetzel <asmushet...@yahoo.de.INVALID> wrote: 
>  Hello Haibin, 
> we have the following in the release notes under performance improvements:
>      "Integrated MKLDNN for CPU training and inference acceleration"
> My impression is that this is what PR 8302 is about. I browsed through the 
> code and understand and agree what this PR is trying to achieve.  But must 
> admit that I would feel uncomfortable to wrap this up into a release schedule 
> that early. This PR touches 115 files in a sometimes intrusive way and is not 
> yet finally tested nor integrated into the master branch. Wrapping such a PR 
> in very lately will put a big risk on the release that we should only take 
> when absolutely unavoidable.I personally would either remove this from the 
> release or otherwise move the release date. 
> Let me know if I misunderstood anything. 
> Regards
> Asmus
> 
> 
> 
>  
> 
> Am Donnerstag, 11. Januar 2018, 00:34:04 MEZ hat Haibin Lin 
> <haibin.lin@gmail.com> Folgendes geschrieben:  
>  
>  I am starting the process to prepare for MXNET 1.0.1 release. I have
> drafted release notes
> (*https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.0.1+Release+Notes
> <https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.0.1+Release+Notes>*)
> to cover the tasks under this release.
> 
> A release candidate will be cut on Monday 22nd Jan, 2018 and voting will
> commence from then till Thursday 25th Jan, 2018. If you have any additional
> features in progress and would like to include it in this release, please
> assure they have been merged by Thursday 18th Jan, 2018 with comment so I
> may update the release notes.
> 
> Feel free to add any other comments/suggestions.
> 
> Thanks,
> Haibin
>   


Re: R Build failure

2018-01-11 Thread Haibin Lin
+1 for using free datasets or datasets without license issues and host them on 
s3 buckets to reduce external dependencies. 

On 2018-01-06 15:26, kellen sunderland  wrote: 
> FYI PRs are currently failing to build.  The R "Matrix Factorization" test
> is failing to download this dataset: http://files.grouplens.org/datasets/
> movielens/ml-100k.zip .  The site https://grouplens.org/ appears to be down.
> 
> Issue here: https://github.com/apache/incubator-mxnet/issues/9332
> PR to skip the test here:
> https://github.com/apache/incubator-mxnet/pull/9333
> 
> -Kellen
> 


Re: Test failures due to mxnet.text

2018-01-11 Thread Haibin Lin
I noticed that, too. I pinged the contributor to investigate the cause of the 
failure. Thanks for reporting this, Marco.

Best,
Haibin


On 2018-01-11 13:45, Marco de Abreu  wrote: 
> Hello,
> 
> apparently, the recently introduced mxnet.text API introduces test failures
> https://github.com/apache/incubator-mxnet/pull/8763. It would be great if
> the two following issues could be investigated:
> http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/master/runs/175/nodes/336/steps/629/log/?start=0
> 
> test_text.test_glove ... FAIL
> Traceback (most recent call last):
> 
>   File "C:\Anaconda3\envs\py2\lib\site-packages\nose\case.py", line 197, in
> runTest
> 
> self.test(*self.arg)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\tests\python\unittest\test_text.py",
> line 125, in test_glove
> 
> 'glove', pretrained_file_name='glove.6B.50d.txt')
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
> line 371, in create
> 
> return create_text_embedding(embedding_name, **kwargs)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\registry.py",
> line 163, in create
> 
> return registry[name](*args, **kwargs)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
> line 538, in __init__
> 
> self._load_embedding(pretrained_file_path, ' ', init_unknown_vec)
> 
>   File
> "C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py",
> line 201, in _load_embedding
> 
> % (line_num, token, len(elems), vec_len)
> 
> AssertionError: At line 321803 of the pre-trained token embedding file: the
> dimension of token nonslip is 7 but the dimension of previous tokens is 50.
> Dimensions of all the tokens must be the same.
> 
>  >> begin captured logging << 
> 
> root: INFO: Loading pre-trained token embedding vectors from
> C:\Windows\system32\config\systemprofile\.mxnet\embeddings\glove\glove.6B.50d.txt
> 
> - >> end captured logging << -
> 
> 
> Also, we got a skipped test:
> test_text.test_fasttext ...
> C:\jenkins_slave\workspace\ut-python-cpu\pkg_vc14_cpu\python\mxnet\text\embedding.py:188:
> UserWarning: At line 1 of the pre-trained text embedding file: token 111051
> with 1-dimensional vector [300.0] is likely a header and is skipped.
> 
>   'skipped.' % (line_num, token, elems))
> 
> 
> 
> Thank you
> 
> -Marco
> 


Release plan - MXNET 1.0.1

2018-01-10 Thread Haibin Lin
I am starting the process to prepare for MXNET 1.0.1 release. I have
drafted release notes
(*https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.0.1+Release+Notes
*)
to cover the tasks under this release.

A release candidate will be cut on Monday 22nd Jan, 2018 and voting will
commence from then till Thursday 25th Jan, 2018. If you have any additional
features in progress and would like to include it in this release, please
assure they have been merged by Thursday 18th Jan, 2018 with comment so I
may update the release notes.

Feel free to add any other comments/suggestions.

Thanks,
Haibin


Re: Status of Sparse Tensor Support in MXNet

2017-09-27 Thread Haibin Lin
I'm not sure why hyperlinks don't work well with the mailing list. Here's a
duplicate without the links.

I’ve been working on sparse tensor support in MXNet. I’d like to share a
bit regarding what I worked on and gather some inputs/feature requests from
the community.

Recently sparse tensor CPU support has been merged to MXNet master with:
- Two sparse data formats: Compressed Sparse Row(CSR, for sparse inputs)
and Row Sparse (for sparse gradients)
- Two data iterators for sparse data input: NDArrayIter and LibSVMIter
- Three optimizers for sparse gradient updates: Ftrl(@CNevd), SGD and Adam
- Sparse storage conversion, matrix-matrix product, matrix-vector product,
and sparse gradient aggregation operators (CPU @reminisce, GPU
@stefanhenneking)
- Many sparse element-wise CPU operators including arithmetic (e.g.
elemwise_add), rounding, trigonometric, hyperbolic, exponents, logarithms,
and power operators (mainly implemented for Row Sparse but not yet for CSR
@cjolivier01).
- Distributed kvstore with sparse push/pull (CPU only, 64-bit hashed keys
not supported for distributed training)
- Distributed linear regression example with sparse data

There’re also some ongoing benchmarking efforts for matrix multiplication,
memory usage and distributed training within MXNet (@anirudh2290) and
tutorials regarding basic sparse operations (work in progress, comments are
welcome).

The future work I have in mind includes:
- Update document to reflect available sparse operators and benchmark
results
- Sparse embedding operator
- Adagrad optimizer for sparse gradient updates
- Reduce sum operator for CSR
- Gluon interface support
- Factorization machine example
- Noise contrastive estimation example

What sparse related features and operator support would you need and what
do you want to use it for? Do you want any item in the list of future work
to become available sooner? Any feedback is welcome. Thanks a lot.

Best,
Haibin


On Wed, Sep 27, 2017 at 10:12 AM, Haibin Lin <haibin.lin@gmail.com>
wrote:

> (It looks like the previous email didn’t go through. Resending it)
>
>
>
> Hi everyone,
>
>
>
> I’ve been working on sparse tensor support in MXNet. I’d like to share a
> bit regarding what I worked on and gather some inputs/feature requests from
> the community.
>
>
>
> Recently sparse tensor CPU support has been merged to MXNet master with:
>
>- Two sparse data formats: Compressed Sparse Row
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html#mxnet.ndarray.sparse.CSRNDArray>(CSR,
>for sparse inputs) and Row Sparse
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html#mxnet.ndarray.sparse.RowSparseNDArray>
>  (for
>sparse gradients)
>- Two data iterators for sparse data input: NDArrayIter
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/io/io.html#mxnet.io.NDArrayIter>
> and LibSVMIter
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/io/io.html#mxnet.io.LibSVMIter>
>- Three optimizers for sparse gradient updates: Ftrl
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/optimization/optimization.html#mxnet.optimizer.Ftrl>
>(@CNevd), SGD
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/optimization/optimization.html#mxnet.optimizer.SGD>
> and Adam
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/optimization/optimization.html#mxnet.optimizer.Adam>
>- Sparse storage conversion
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html#mxnet.ndarray.sparse.cast_storage>
>, matrix-matrix product
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html#mxnet.ndarray.sparse.dot>
>, matrix-vector product
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html#mxnet.ndarray.sparse.dot>,
>and sparse gradient aggregation
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/ndarray/sparse.html#mxnet.ndarray.sparse.add_n>
>  operators
>(CPU @reminisce, GPU @stefanhenneking)
>- Many sparse element-wise CPU operators including: arithmetic (e.g.
>elemwise_add), rounding, trigonometric, hyperbolic, exponents,
>logarithms, and power operators (mainly implemented for Row Sparse but not
>yet for CSR @cjolivier01).
>- Distributed kv-store with sparse push
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/kvstore/kvstore.html#mxnet.kvstore.KVStore.push>
>/pull
>
> <https://mxnet.incubator.apache.org/versions/master/api/python/kvstore/kvstore.html#mxnet.kvstore.KVStore.row_s

Status of Sparse Tensor Support in MXNet

2017-09-27 Thread Haibin Lin
(It looks like the previous email didn’t go through. Resending it)



Hi everyone,



I’ve been working on sparse tensor support in MXNet. I’d like to share a
bit regarding what I worked on and gather some inputs/feature requests from
the community.



Recently sparse tensor CPU support has been merged to MXNet master with:

   - Two sparse data formats: Compressed Sparse Row
   
(CSR,
   for sparse inputs) and Row Sparse
   

(for
   sparse gradients)
   - Two data iterators for sparse data input: NDArrayIter
   

and LibSVMIter
   

   - Three optimizers for sparse gradient updates: Ftrl
   

   (@CNevd), SGD
   

and Adam
   

   - Sparse storage conversion
   

   , matrix-matrix product
   

   , matrix-vector product
   
,
   and sparse gradient aggregation
   

operators
   (CPU @reminisce, GPU @stefanhenneking)
   - Many sparse element-wise CPU operators including: arithmetic (e.g.
   elemwise_add), rounding, trigonometric, hyperbolic, exponents,
   logarithms, and power operators (mainly implemented for Row Sparse but not
   yet for CSR @cjolivier01).
   - Distributed kv-store with sparse push
   

   /pull
   

(CPU
   only, 64-bit hashed keys not supported for distributed training)
   - Distributed linear regression
   
example
   with sparse data



There’re also some ongoing benchmarking efforts for matrix multiplication,
memory usage and distributed training within MXNet (@anirudh2290) and
tutorials  regarding
basic sparse operations (work in progress, comments are welcome).



The future work I have in mind includes:

   - Update document to reflect available sparse operators and benchmark
   results
   - Sparse embedding operator
   - Adagrad optimizer for sparse gradient updates
   - Reduce sum operator for CSR
   - Gluon interface support
   - Factorization machine example
   - Noise contrastive estimation example



*What sparse related features and operator support would you need and what
do you want to use it for? Do you want any item in the list of future work
to become available sooner? Any feedback is welcome. Thanks a lot.*



Best,

Haibin