from:"Zhao, Patric"

RE: [Announcement] New Committer - Sam Skalicky

2020-07-29 Thread Zhao, Patric

Congratulations, Sam, thanks all of your great works in MXNet 

> -Original Message-
> From: Chaitanya Bapat 
> Sent: Thursday, July 30, 2020 1:12 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Announcement] New Committer - Sam Skalicky
> 
> Congratulations Sam! Well deserved!
> 
> On Wed, 29 Jul 2020 at 08:05, Marco de Abreu 
> wrote:
> 
> > Welcome!
> >
> > -Marco
> >
> > On Wed, Jul 29, 2020, 4:58 PM sandeep krishnamurthy <
> > sandeep.krishn...@gmail.com> wrote:
> >
> > > Hello all,
> > >
> > > Please join me in welcoming Sam Skalicky(@samskalicky) as a new
> > > committer of Apache MXNet (incubating)!
> > >
> > > Sam has made a number of contributions to this project such as
> > > SubGraphs, Custom Ops, Accelerator APIs, along with several other
> > > operator implementations and bug fixes. Sam has been actively
> > > engaging in PR reviews, dev@ list discussions and helping the
> > > project and fellow contributors.
> > >
> > > Sam, thank you for all your contributions and looking forward to
> > > more support!
> > >
> > > Welcome, Sam!
> > >
> > > --
> > > Sandeep Krishnamurthy
> > >
> >
> 
> 
> --
> *Chaitanya Prakash Bapat*
> *+1 (973) 953-6299*
> 
> [image: https://www.linkedin.com//in/chaibapat25]
> [image:
> https://www.facebook.com/chaibapat]
> [image:
> https://twitter.com/ChaiBapchya] [image:
> https://www.linkedin.com//in/chaibapat25]
>

RE: assimilation of mshadow into the MXNet codebase

2020-07-26 Thread Zhao, Patric

Several peoples in below list are from Intel and I have added them into CC.

Sheng, you can contact with them for ICLA.

Thanks,

--Patric

> -Original Message-
> From: Sheng Zha 
> Sent: Monday, July 27, 2020 5:33 AM
> To: Justin Mclean 
> Cc: d...@mxnet.apache.org; Wall Michael ; Bob Paulin
> ; wei...@apache.org; jason...@apache.org; Chen, Ciyong
> 
> Subject: Re: assimilation of mshadow into the MXNet codebase
> 
> Hi,
> 
> Here's an update on this issue. We are still missing the ICLAs from 32 (out 
> of 70)
> mshadow contributors, accounting for a total of 62 (out of 913) commits. (@ap-
> hynninen passed away a few years ago and is not included). I reached out to
> them through email and other channels to collect ICLA for mshadow. I will wait
> for a day or two before updating on the progress again, and we can decide then
> whether we are good to start the IP clearance.
> 
> The complete list of mshadow contributors' GitHub logins that are missing ICLA
> is here ("#commits @github-login"):
> 
> 8 @Lorrainexun
> 6 @tornadomeet
> 5 @asmushetzel
> 3 @zhenlinluo
> 3 @stefanhenneking
> 3 @jpauwels
> 3 @hjk41
> 3 @DrustZ
> 2 @zhangchen-qinyinghua
> 2 @yinghu5
> 2 @reyoung
> 2 @forwchen
> 1 @yupbank
> 1 @yllan
> 1 @xinyu-intel
> 1 @xingmingjie
> 1 @xianyi
> 1 @tdomhan
> 1 @siemanko
> 1 @qiaohaijun
> 1 @maxint
> 1 @loveisp
> 1 @lebeg
> 1 @kdavis-mozilla
> 1 @kaleidoscopical
> 1 @jason-xuan
> 1 @happynear
> 1 @glingyan
> 1 @asitstands
> 1 @antoine-wdg-rmz
> 1 @alextnewman
> 1 @Harmonicahappy
> 
> Best,
> Sheng
> 
> On Thu, Jul 23, 2020 at 12:28 AM Sheng Zha  wrote:
> 
> > Hi,
> >
> > No, I don’t think we used ICLAs for mshadow before.
> >
> > Out of the 42 people who made more than 1 commit or more than 10 lines
> > of code change to mshadow, 26 signed ICLA with Apache (and
> > additionally one member is unfortunately deceased...). Would this be a
> > better criteria as “the major ones”? I wasn’t part of the initial code
> > donation or the initial PPMC group, so apologies if the questions were 
> > silly.
> >
> > I think the rest of the commits are manageable so that I could do a
> > revert and rework for those commits if/when necessary.
> >
> > Regards,
> > Sheng
> >
> > > On Jul 22, 2020, at 11:50 PM, Justin Mclean
> > > 
> > wrote:
> > >
> > > Hi,
> > >
> > >> Thanks for clarifying. All contributors who made more than 10
> > >> commits
> > to msahdow before are committers of MXNet, so their ICLAs should
> > already be on file: tqchen, bingxu, eric.xie, sxjscience, mli,
> > yajiedesign [1]. If you think this is OK, one of the mentors or I can start 
> > the
> notification.
> > >
> > >
> > > What about the other 60 contributors? More than 10 commits is not a
> > > line
> > I would feel comfortable with. You need to be able to account for the
> > IP provenance of every line of code, just like in your initial code 
> > donation.
> > It would probably be best to make a list all contributors and if they
> > have an ICLA or not. Did the mshadow project use ICLAs? If so that may also
> help.
> > >
> > > Thanks,
> > > Justin
> >

RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc1

2020-07-19 Thread Zhao, Patric

+1 

Passed the performance benchmarking for CPU tests and no regression is found.


> -Original Message-
> From: Aston Zhang 
> Sent: Sunday, July 19, 2020 1:45 PM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org; Bob Paulin ; Henri Yandell
> ; Jason Dai ; Markus Weimer
> ; Michael Wall 
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc1
> 
> +1
> Passed d2l-en v0.14.1: https://github.com/d2l-ai/d2l-en/releases/tag/v0.14.1
> 
> On Thu, Jul 16, 2020 at 2:34 AM Chen, Ciyong  wrote:
> 
> > Dear MXNet community,
> >
> > This is the vote to release Apache MXNet (incubating) version 1.7.0.
> > Voting will start 16th July 23:59:59 PST and close on 19th July
> > 23:59:59 PST.
> >
> > Link to release notes:
> > https://cwiki.apache.org/confluence/display/MXNET/1.7.0+Release+notes
> >
> > Link to release candidate:
> > https://github.com/apache/incubator-mxnet/releases/tag/1.7.0.rc1
> >
> > Link to source and signatures on apache dist server:
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc1
> >
> > Please remember to TEST first before voting accordingly:
> > +1 = approve
> > +0 = no opinion
> > -1 = disapprove (provide reason)
> >
> > Here's the changes comparing to 1.7.0.rc0:
> >
> >   *   Revert "Fix memory leaks in Gluon (#18328) (#18358) (#18692)
> >   *   revise activations (#18700)
> >   *   Fix the monitor_callback invalid issue during calibration with
> > variable input shapes (#18632) (#18703)
> >
> >
> > Best regards,
> > Ciyong Chen
> >

RE: Global Search Now Available on MXNet Website

2020-05-21 Thread Zhao, Patric

I have tried it and It's really useful 

Thanks for the improvements, Yang.

> -Original Message-
> From: sandeep krishnamurthy 
> Sent: Thursday, May 21, 2020 2:42 PM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Global Search Now Available on MXNet Website
> 
> This is so very helpful Yang. Thank you so much for contributing this :-)
> 
> On Wed, 20 May 2020, 10:36 pm Lin Yuan,  wrote:
> 
> > Awesome work! Thanks a lot for making this desirable feature happen.
> >
> > Lin
> >
> > On Wed, May 20, 2020 at 8:45 AM Yang Shi 
> wrote:
> >
> > > Hi MXNet Community,
> > >
> > > Global search feature is added to the main information pages of
> > > MXNet website. It can search for contents across all site in any version.
> > > Currently it is available on master website, and will be supported
> > > on
> > v1.6
> > > website shortly.
> > >
> > > Best regards,
> > > Yang
> > >
> >

RE: Stopping nightly releases to Pypi

2019-12-26 Thread Zhao, Patric

Agree, we should add the selection in the installation page for nightly build.

https://mxnet.apache.org/get_started?version=master=linux=python=pip=cpu#


> -Original Message-
> From: Haibin Lin 
> Sent: Tuesday, December 17, 2019 2:40 PM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Stopping nightly releases to Pypi
> 
> Shall we update the website installation page with nightly build information
> as well (after we figure out the CD details)?
> 
> Best,
> Haibin
> 
> On Tue, Dec 10, 2019 at 10:15 PM Lausen, Leonard
> 
> wrote:
> 
> > Not yet. As a community, we first need to add the nightly build
> > hosting feature to the community run CD and then we can add the page
> > so that the exact date doesn't need to be specified.
> >
> > I'm not sure what steps are required for this. Do we need to host the
> > artifacts on Apache's infrastructure? Or can we host the nightly CD
> > artifacts as part of the AWS sponsored community-maintained CD (S3
> > bucket associated to the account)?
> >
> > In the meantime, the "proprietary" AWS build solution could be
> > extended to publish an html page per artifact type (mxnet,
> > mxnet-cu100, ...) containing a link to all recent builds.
> >
> > Best regards
> > Leonard
> >
> > On Tue, 2019-12-10 at 22:03 -0800, Lin Yuan wrote:
> > > Is there a way to install the latest nightly package without having
> > > to specify exact date?
> > >
> > > Thanks,
> > >
> > > Lin
> > >
> > > On Sun, Dec 8, 2019 at 6:13 PM Lausen, Leonard
> > >  > >
> > > wrote:
> > >
> > > > From Shanghai, the closest endpoint (automatically chosen
> > > > endpoint) is
> > in
> > > > Tokyo
> > > > and download speed for mxnet-mkl was on average 1.7 MB/s with a
> > maximum of
> > > > 5
> > > > MB/s during my test.
> > > >
> > > > On Sun, 2019-12-08 at 01:30 +, Sheng Zha wrote:
> > > > > > Heres a set of links for today’s builds
> > > > > >
> > > > > > (Plain mxnet, no mkl no cuda)
> > > > > >
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > > (mxnet-mkl)
> > > > > >
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > > (mxnet-cuXXX)
> > > > > >
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu90-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu92-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu100-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu101-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > > (mxnet-cuXXXmkl)
> > > > > >
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu90mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu92mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu100mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://apache-mxnet.s3-us-west-2.amazonaws.com/dist/2019-12-
> 07/dist/m
> > xnet_cu101mkl-1.6.0b20191207-py2.py3-none-manylinux1_x86_64.whl
> > > > > These links are not utilizing the s3 accelerate feature (i.e.
> > > > > not
> > backed
> > > > by
> > > > > cloudfront edges). Please use repo.mxnet.io instead. The updated
> > links
> > > > are:
> > > > > (Plain mxnet, no mkl no cuda)
> > > > >
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet-1.6.0b20191207-py2.py
> > 3-none-manylinux1_x86_64.whl
> > > > > (mxnet-mkl)
> > > > >
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_mkl-1.6.0b20191207-py
> > 2.py3-none-manylinux1_x86_64.whl
> > > > > (mxnet-cuXXX)
> > > > >
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90-1.6.0b20191207-p
> > y2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92-1.6.0b20191207-p
> > y2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu100-1.6.0b20191207-
> > py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu101-1.6.0b20191207-
> > py2.py3-none-manylinux1_x86_64.whl
> > > > > (mxnet-cuXXXmkl)
> > > > >
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu90mkl-1.6.0b2019120
> > 7-py2.py3-none-manylinux1_x86_64.whl
> > > >
> > https://repo.mxnet.io/dist/2019-12-07/dist/mxnet_cu92mkl-1.6.0b2019120
> > 7-py2.py3-none-manylinux1_x86_64.whl
> > > >
> >

RE: Proposal for MXNet website improving

2019-12-22 Thread Zhao, Patric

From my view, performance is a big plus for MXNet and the reason why lots of 
people adopted in MXNet.

I still think we need to have a top-level class for "performance".

Thanks,

--Patric 

> -Original Message-
> From: Chen, Ciyong 
> Sent: Monday, December 23, 2019 12:08 PM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: Proposal for MXNet website improving
> 
> Hi Aaron,
> 
> Thanks for your valuable feedback.
> I'll prepare to contribute this change and PR soon, and update the contents
> as suggested.
> 
> Regarding making "Performance" a Key Feature to replace with "Tools &
> Libraries", anything I need to take care when removing "Tools & Libraries"
> part?
> 
> Thanks!
> -Ciyong
> 
> -Original Message-
> From: Aaron Markham 
> Sent: Saturday, December 21, 2019 4:14 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Proposal for MXNet website improving
> 
> Hi Ciyong, thanks for the proposal.
> I like your suggestions. Will you be submitting a PR?
> 
> Some feedback:
> 
> * Regarding changing the URLs, let's avoid that. We just had a lot of work
> trying to fix broken links.
> * As far as changing the headings, sure, Tutorials and FAQs makes sense.
> * Adding performance as a nav item - my preference and going from UX
> guidelines, is to keep the number of them down to less than five or six.
> - What about making performance a Key Feature and highlighting that on
> the main page? I'd switch it with Tools & Libraries since Ecosystem is the 
> next
> thing below.
> 
> Cheers,
> Aaron
> 
> On Thu, Dec 19, 2019 at 2:03 AM Chen, Ciyong 
> wrote:
> >
> > Hi MXNet community,
> >
> > While doing search for MXNet from the official
> website[https://mxnet.incubator.apache.org/], it's not that convenient to
> get the recent/latest performance data, besides there's some mismatch
> between the link and description in the current websites.
> > We can also add some new contents (like distributed training via Horovod,
> and AMP with bfloat16 data type) descriptions in FAQ section.
> >
> > So I propose to improve the current website structure from below 3
> > areas
> >
> > 1.Add a new Tab "Performance" in the header, and change
> "Doc" to "Tutorials" according to the current contents.
> >
> > 2.Align description of FAQ section to the inner page.
> >
> > 3.FAQ list adjustment
> >
> > Please check the details via below link
> >
> https://drive.google.com/open?id=1gQrC1V1LeJH5NT6zRqBl8Ub2qSr1dc8O
> >
> > Suggestions and comments are highly appreciated.
> >
> > Thanks!
> > -Ciyong
> >

RE: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0

2019-12-16 Thread Zhao, Patric

Thanks, Tredak, I will add some words for the new feature in the release note.

+1 for voting because we have ran multiple time of tests in local and got the 
expected performance boost.

--Patric

> -Original Message-
> From: Przemysław Trędak 
> Sent: Tuesday, December 17, 2019 4:49 AM
> To: d...@mxnet.apache.org
> Subject: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc0
> 
> Dear MXNet community,
> 
> This is the vote to release Apache MXNet (incubating) version 1.6.0. Voting
> starts now and will close on Friday, 20th December 2019 23:59:59 PST.
> 
> Link to release notes:
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> 
> Link to release candidate:
> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc0
> 
> Link to source and signatures on apache dist server:
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc0/
> 
> Please remember to TEST first before voting accordingly:
> +1 = approve
> +0 = no opinion
> -1 = disapprove (provide reason)
> 
> Additional notes:
>  - There was an issue[1] raised that 1.6.0.rc0 does not build with clang on
> FreeBSD - I decided to not block the voting for this and instead let the
> Community decide whether this is a blocker for the release.
>  - Patric Zhao and Tao Lv - could you help preparing a paragraph on MKLDNN
> 1.0 update in the New features section in the release notes?
> 
> [1] https://github.com/apache/incubator-mxnet/issues/17076
> 
> Best regards,
> Przemyslaw Tredak

RE: Performance regression from removing libiomp5.so

2019-12-11 Thread Zhao, Patric

Thanks, Sam.

The root cause is from different OpenMP library. Intel OpenMP will provide 
better performance as your data shown.

Regarding release, since the license issue[1], we can't ship Intel OpenMP in 
the binary, but the most of performance boost from MKLDNN is still available.
I think it should be acceptable to release 1.6 with MKLDNN  + GNU OpenMP for 
suboptimal performance. 

To achieve the best performance, user should build from source to enable more 
advanced features like Intel MKL, Intel OpenMP, AVX512.

Thanks,

--Patric

[1] https://www.apache.org/legal/resolved.html#category-x



> -Original Message-
> From: Skalicky, Sam 
> Sent: Wednesday, December 11, 2019 1:36 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Keshavan, Arjuna ; Harish, Nihal
> 
> Subject: Performance regression from removing libiomp5.so
> 
> Hi MXNet community,
> 
> I would like to bring your attention to the performance regression that was
> found [1] between 1.5.1 and 1.6.0 due to removing the libiomp5.so library
> due to licensing issues. This change was made since this library has a 
> category
> x license [2] that is not compatible with the MXNet Apache
> license/distribution.
> 
> We found that using OpenBLAS instead of MKL BLAS caused a regression
> from 1500 samples/sec to 1300 samples/sec a 13.3% regression in training
> speed for a resnet18 training benchmark on a C5.18xlarge EC2 instance (with
> 72 cores). Rebuilding with MKL BLAS showed an increase in performance to
> 1600 samples/sec in the 1.6.0 branch.
> 
> Please provide your feedback on the licensing issue (are there any work-
> arounds) and the tradeoff in performance (is the benefit worth trying to
> include back into MXNet builds).
> 
> Thanks to the efforts of the following folks for working on this issue (in no
> particular order):
> Patric Zhao
> Amol Lele
> Tao Lv A
> Pedro Larroy
> Nihal Harish
> Chai Bapat
> Arjuna Keshavan
> Rong Zhang
> 
> Thanks!
> Sam
> 
> [1] https://github.com/apache/incubator-mxnet/issues/16891
> [2] https://www.apache.org/legal/resolved.html#category-x

RE: MXNet list on Github

2019-11-24 Thread Zhao, Patric

It’s great we have a full list about MXNet applications.

I think it will be better MXNet community can maintain an official list in the 
MXNet website.

Thanks,

--Patric

From: Chaitanya Bapat 
Sent: Monday, November 25, 2019 8:36 AM
To: dev@mxnet.incubator.apache.org; u...@mxnet.apache.org
Subject: MXNet list on Github

Hello MXNet community,

Whilst searching on Github, I stumbled upon this cool MXNet list - 
https://github.com/chinakook/Awesome-MXNet

I guess this was made as an inspiration from other "Awesome" lists - 
https://github.com/ChristosChristofidis/awesome-deep-learning
https://github.com/bharathgs/Awesome-pytorch-list
https://github.com/jtoy/awesome-tensorflow

I would like to call out the great work done in collecting all things MXNet at 
one place by @chinakook. I have pushed a couple 
PRs to add/update the README. It would be great to see this updated regularly 
with the latest and greatest in the world of MXNet.

Thanks,
Chai
--
Chaitanya Prakash Bapat
+1 (973) 953-6299

[Image removed by sender. 
https://www.linkedin.com//in/chaibapat25][Image 
removed by sender. 
https://www.facebook.com/chaibapat][Image 
removed by sender. 
https://twitter.com/ChaiBapchya][Image removed 
by sender. 
https://www.linkedin.com//in/chaibapat25]

roadmap discussion for release 1.7

2019-11-19 Thread Zhao, Patric

Hi MXNet community,

The release 1.6 is WIP and will be released soon. I think it’s time to discuss 
the roadmap of 1.7.

I have created a github thread  (#16864) for the new feature discussion.

Feel free to add your plan in it 

https://github.com/apache/incubator-mxnet/issues/16864

Thanks,

--Patric

RE: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Zhao, Patric

Thanks all of the great suggestions. 

Regarding the binary release, including w/o MKLDNN build, I have summarized a 
table (check attachment).

- Major changes in python packages, see attached table. 
- Switch on MKLDNN for no mkl suffix binary in release 1.7 (Red check mark) 
- Add new mxnet-native build w/o MKLDNN and cuDNN (Yellow background)
  Track the usage/download in 1-2 releases and then decide if we need it for a 
long time
- Drop all mkl suffix binary in next major release v2.x.

Thanks,

--Patric

> -Original Message-
> From: Lin Yuan 
> Sent: Wednesday, November 20, 2019 5:40 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Tao Lv 
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> Also per Sam's suggestion, we could still release a build without MKLDNN
> (name it mxnet-nomkldnn?) and track the usage/download for one or two
> releases. If there is no usage, we could drop that build in the future.
> 
> Best,
> 
> Lin
> 
> On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:
> 
> > Just to summarize base on the concerns Marco raised and discussed
> abvove:
> >
> > - AMD CPU (it should work with MKLDNN:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+M
> KL
> > -DNN+-+Performance+Benchmarking
> > )
> > - ARM CPU (we don't have it today w/o MKLDNN either)
> > - Windows (Windows support is there regardless of MKLDNN or not)
> > - GPU and MKLDNN enabled (already supported)
> > - Fully reproducible results (medical and financial sector requested
> > that and we have some flags for cuda) (The nondeterminism exists even
> > today w/o MKLDNN. We should address it regardless of MLKDNN)
> >
> > Marco, please let us know if your concerns are properly addressed?
> >
> > Given that MKLDNN gives significant performance speed up in CPU, I am
> > inclined to make it default in pip build.
> >
> > Best,
> >
> > Lin
> >
> > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> > wrote:
> >
> >> Thanks, Patric. I was just trying to point out that there was
> >> currently no guarantee of deterministic results without MKL, so
> >> there’s not necessarily an expectation of determinism with MKL (ie
> requirement isn’t relaxed).
> >>
> >> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> >> wrote:
> >>
> >> > It may be a concern but little noise can't affect the final results
> >> > if
> >> the
> >> > algorithm is stable in numerical.
> >> > The MKLDNN backend with mxnet-mkl has been used for 2 years and
> we
> >> didn't
> >> > see the coverage issue caused by multiple threading.
> >> > In other words, GPU programming mode works well on training where
> >> > the non-deterministic also exists from multiple threads.
> >> >
> >> > Parts of training accuracy was pasted in the first PR when MKLDNN
> >> > is integrated.
> >> >
> >> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-
> 3596
> >> 74818
> >> >
> >> > In conclusion, it may happen with very little probability. I
> >> > believe we can get a solution in case it happens someday.
> >> >
> >> > Thanks,
> >> >
> >> > --Patric
> >> >
> >> >
> >> > > -Original Message-
> >> > > From: Chris Olivier 
> >> > > Sent: Tuesday, November 19, 2019 11:51 AM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Cc: Tao Lv 
> >> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >> > >
> >> > > (for non mkl dropout, for instance)
> >> > >
> >> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier
> >> > > 
> >> > > wrote:
> >> > >
> >> > > > To address the deterministic item, I know for a fact that
> >> > > > training will not be deterministic in some cases where the “parallel
> random”
> >> > > > class is utilized in parallel threads, such as OMP, if the
> >> > > > number of cores is different, even with the same seed, because
> >> > > > threads are seeded independently and different number of
> >> > > > threads will end up generating different random number
> >> > > > sequences. Dropout operator being
> >> > > an example.
> >> > > >
> >> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
>

RE: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Zhao, Patric

Then we can start tracking this on a regular
> >> > > basis. It
> >> > would
> >> > > be great to actually test on ARM instances now that AWS has A1
> >> instances
> >> > > too…..ill add it to the wish list ;-D
> >> > >
> >> > > Sam
> >> > >
> >> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <
> >> alfredo.lu...@airbnb.com
> >> > .INVALID>
> >> > > wrote:
> >> > > >
> >> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and
> >> > > > first generation AMD Threadripper Gen 1 if someone has
> >> > > > something easy to
> >> run
> >> > > and
> >> > > > representative.
> >> > > >
> >> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> >> > > > sska...@amazon.com.invalid) wrote:
> >> > > >
> >> > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs?
> >> Or
> >> > is
> >> > > > there someone else in the mxnet dev@ community who can help?
> >> > > >
> >> > > > Sam
> >> > > >
> >> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> >> > > >  wrote:
> >> > > >>
> >> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen /
> >> Epyc)
> >> > > > would
> >> > > >> definitely make sense as a requirement. It seems odd to
> >> > > >> classify
> >> that
> >> > as
> >> > > > a
> >> > > >> “nonstandard” use case.
> >> > > >>
> >> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> >> > > >> sska...@amazon.com.invalid) wrote:
> >> > > >>
> >> > > >> Thanks Patric & team for your work over the years to make
> >> > > >> MXNet
> >> fast
> >> > > with
> >> > > >> MKLDNN!
> >> > > >>
> >> > > >> I think it would be great to make MKLDNN enabled by default.
> >> > > >> We
> >> will
> >> > > need
> >> > > >> to continue producing variants without MKLDNN for those who
> >> > > >> don’t
> >> want
> >> > > it
> >> > > >> (Marco enumerated some use cases). How do you propose to
> >> > > >> identify
> >> the
> >> > > pip
> >> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> >> > > > mxnet-cu101mkl
> >> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains
> >> > > >> MKLDNN
> >> what
> >> > do
> >> > > > you
> >> > > >> propose we call the build without MKLDNN? mxnet-nomkl?
> >> > > >>
> >> > > >> Thanks!
> >> > > >> Sam
> >> > > >>
> >> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> >> > marco.g.ab...@gmail.com>
> >> > > >> wrote:
> >> > > >>>
> >> > > >>> Hi Patric,
> >> > > >>>
> >> > > >>> First of all, thanks a lot to you and your team for all the
> >> > > >>> effort
> >> on
> >> > > >> MXNet
> >> > > >>> and mkldnn!
> >> > > >>>
> >> > > >>> Generally I'm inclined towards your proposal, but I'm
> >> > > >>> thinking
> >> about
> >> > > the
> >> > > >>> non-standard use cases:
> >> > > >>> - AMD CPU
> >> > > >>> - ARM CPU
> >> > > >>> - Windows
> >> > > >>> - GPU and MKLDNN enabled
> >> > > >>> - Fully reproducible results (medical and financial sector
> >> requested
> >> > > > that
> >> > > >>> and we have some flags for cuda)
> >> > > >>>
> >> > > >>> Is mkldnn fully compatible with these use cases? If not, what
> >> would
> >> > > >> happen?
> >> > > >>>

RE: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Zhao, Patric

gt; > > >> fast
> > > with
> > > >> MKLDNN!
> > > >>
> > > >> I think it would be great to make MKLDNN enabled by default. We
> > > >> will
> > > need
> > > >> to continue producing variants without MKLDNN for those who don’t
> want
> > > it
> > > >> (Marco enumerated some use cases). How do you propose to identify
> the
> > > pip
> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > > > mxnet-cu101mkl
> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN
> > > >> what
> > do
> > > > you
> > > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > > >>
> > > >> Thanks!
> > > >> Sam
> > > >>
> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> > marco.g.ab...@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Hi Patric,
> > > >>>
> > > >>> First of all, thanks a lot to you and your team for all the
> > > >>> effort
> on
> > > >> MXNet
> > > >>> and mkldnn!
> > > >>>
> > > >>> Generally I'm inclined towards your proposal, but I'm thinking
> about
> > > the
> > > >>> non-standard use cases:
> > > >>> - AMD CPU
> > > >>> - ARM CPU
> > > >>> - Windows
> > > >>> - GPU and MKLDNN enabled
> > > >>> - Fully reproducible results (medical and financial sector
> requested
> > > > that
> > > >>> and we have some flags for cuda)
> > > >>>
> > > >>> Is mkldnn fully compatible with these use cases? If not, what
> > > >>> would
> > > >> happen?
> > > >>> If yes, do we have performance numbers?
> > > >>>
> > > >>> Best regards,
> > > >>> Marco
> > > >>>
> > > >>> Zhao, Patric  schrieb am Mo., 18. Nov.
> > > >>> 2019,
> > > >> 14:00:
> > > >>>
> > > >>>> Hi MXNet community,
> > > >>>>
> > > >>>> From the first MKLDNN backend integrated in release 1.2, the
> > community
> > > >> is
> > > >>>> continuously improving the quality and performance of MKLDNN
> > > >>>> CPU
> > > >> backend.
> > > >>>> Nowadays, the MKLDNN backend is widely used for the inference,
> > > >> especially
> > > >>>> for INT8 inference, and we got lots of very positive feedbacks
> from
> > > >> MXNet
> > > >>>> users.
> > > >>>>
> > > >>>> Achieved milestones as below:
> > > >>>>
> > > >>>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb,
> > > >>>> 2018
> > [1]
> > > >>>> - MKLDNN backend as default CPU backend from source building,
> > > >>>> Jan,
> > > 2019
> > > >> [2]
> > > >>>> - MKLDNN subgraph optimization as default for the inference,
> > > >>>> Jul,
> > 2019
> > > >> [3]
> > > >>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
> > > >>>>
> > > >>>> To make more successful and technical leadership for Apache
> > > >>>> MXNet
> in
> > > > the
> > > >>>> industry, I propose to make MKLDNN as default CPU backend in
> > > >>>> all
> > > binary
> > > >>>> distribution from the next release.
> > > >>>> The new milestone includes:
> > > >>>>
> > > >>>> - Static link MKLDNN library in the binary avoiding the
> > > >>>> mismatch
> > > > version
> > > >>>> in the runtime [5]
> > > >>>> - Make nightly build with MKLDNN default from master pre 1.7
> release
> > > >>>> - Binary distribution with MKLDNN default from 1.7 release.
> > > >>>>
> > > >>>> What will be changed:
> > > >>>>
> > > >>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
> > > >>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor
> > release
> > > >>>> (1.x) and plan to remove in next major release (2.0)
> > > >>>>
> > > >>>> Suggestions and comments are highly appreciated.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> --Patric
> > > >>>>
> > > >>>>
> > > >>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
> > > >>>> [2]
> > > >>>>
> > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c2629591
> 01e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
> > > >>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
> > > >>>> [4]
> > > >>>>
> > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c6
> 84e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
> > > >>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
> > > >>>>
> > > >>
> > > >> —
> > > >> Alfredo Luque
> > > >> Software Engineer
> > > >> Machine Learning Infrastructure
> > > >> Airbnb
> > > >> San Francisco, CA
> > > >
> > > > —
> > > > Alfredo Luque
> > > > Software Engineer
> > > > Machine Learning Infrastructure
> > > > Airbnb
> > > > San Francisco, CA
> > >
> > >
> >
> 
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA

Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Zhao, Patric

Hi MXNet community,

>From the first MKLDNN backend integrated in release 1.2,  the community is 
>continuously improving the quality and performance of MKLDNN CPU backend.
Nowadays, the MKLDNN backend is widely used for the inference, especially for 
INT8 inference,  and we got lots of very positive feedbacks from MXNet users. 

Achieved milestones as below:

- MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
- MKLDNN backend as default CPU backend from source building, Jan, 2019 [2]
- MKLDNN subgraph optimization as default for the inference, Jul, 2019 [3]
- MKLDNN major version upgrade in release 1.6, Oct, 2019 [4] 
 
To make more successful and technical leadership for Apache MXNet in the 
industry, I propose to make MKLDNN as default CPU backend in all binary 
distribution from the next release.
The new milestone includes:

- Static link MKLDNN library in the binary avoiding the mismatch version in the 
runtime [5]
- Make nightly build with MKLDNN default from master pre 1.7 release
- Binary distribution with MKLDNN default from 1.7 release.

What will be changed:

- mxnet and mxnet-cuXX binary will be built with MKLDNN=1
- mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release (1.x) 
and plan to remove in next major release (2.0)

Suggestions and comments are highly appreciated.

Thanks,

--Patric


[1] https://github.com/apache/incubator-mxnet/pull/9677
[2] 
https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
[3] https://github.com/apache/incubator-mxnet/pull/15518
[4] 
https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
[5] https://github.com/apache/incubator-mxnet/pull/16731

RE: RE: MXNet 1.6.0 release

2019-11-17 Thread Zhao, Patric

Plan to cherry-pick below PR into 1.6.  Please take a review.

https://github.com/apache/incubator-mxnet/pull/16837

> -Original Message-
> From: Tan, Jonathan 
> Sent: Monday, November 18, 2019 9:43 AM
> To: d...@mxnet.apache.org
> Subject: Re: RE: MXNet 1.6.0 release
> 
> Hi MXNet Community,
> 
> I’ve been doing some testing on the performance of MXnet 1.6.x vs 1.5.1 and
> I noticed some regression in training. You can find more details here:
> https://github.com/apache/incubator-mxnet/issues/16845
> 
> Thanks,
> Jonathan
> 
> On 2019/11/14 02:41:26, "Zhao, Patric"
> mailto:p...@intel.com>> wrote:
> > @Przemek, what's the status of 1.6 release? >
> >
> >
> >
> > Do we have an ETA for the voting in dev@ and general@?>
> >
> >
> >
> > Thanks,>
> >
> >
> >
> > --Patric>
> >
> >
> >
> > > -Original Message->
> >
> > > From: Chaitanya Bapat mailto:ch...@gmail.com>>>
> >
> > > Sent: Friday, November 8, 2019 2:19 PM>
> >
> > > To:
> dev@mxnet.incubator.apache.org<mailto:dev@mxnet.incubator.apache.org
> >>
> >
> > > Cc: d...@mxnet.apache.org<mailto:d...@mxnet.apache.org>>
> >
> > > Subject: Re: MXNet 1.6.0 release>
> >
> > > >
> >
> > > Thanks Przemyslaaw for leading and managing the 1.6 release!>
> >
> > > >
> >
> > > Moreover, thanks for clarifying the difference between code-freeze and
> release>
> >
> > > candidate.>
> >
> > > >
> >
> > > Currently, log_softmax for Large Tensor would fail. It is fixed in this 
> > > PR by
> Hao ->
> >
> > > https://github.com/apache/incubator-mxnet/pull/16711>
> >
> > > It would be great to have that cherry-picked.>
> >
> > > Thanks>
> >
> > > Chai>
> >
> > > >
> >
> > > >
> >
> > > On Thu, 7 Nov 2019 at 17:33, Zhao, Patric
> mailto:pa...@intel.com>> wrote:>
> >
> > > >
> >
> > > > Thanks for the great efforts.>
> >
> > > >>
> >
> > > > I think below PR need to be backported to 1.6 for bugfix in large>
> >
> > > > tensor supports.>
> >
> > > > https://github.com/apache/incubator-mxnet/pull/16737>
> >
> > > >>
> >
> > > > --Patric>
> >
> > > >>
> >
> > > >>
> >
> > > > > -Original Message->
> >
> > > > > From: Przemysław Trędak
> mailto:pt...@apache.org>>>
> >
> > > > > Sent: Friday, November 8, 2019 5:46 AM>
> >
> > > > > To: d...@mxnet.apache.org<mailto:d...@mxnet.apache.org>>
> >
> > > > > Subject: Re: MXNet 1.6.0 release>
> >
> > > > >>
> >
> > > > > Dear MXNet Community,>
> >
> > > > >>
> >
> > > > > From talking to different Members of the Community, I realized there>
> >
> > > > > is a misunderstanding of what "code freeze" actually means. Let me>
> >
> > > > > try to>
> >
> > > > clear>
> >
> > > > > this confusion in this email.>
> >
> > > > >>
> >
> > > > > The code freeze does not mean "1.6 release is done, let's vote on it>
> >
> > > > > and>
> >
> > > > ship>
> >
> > > > > it as-is". As some of You probably noticed, I did not tag a RC0 yet.>
> >
> > > > That is>
> >
> > > > > because code freeze means "there are no more new features going
> to>
> >
> > > > > be accepted in order to provide stable base for finding and fixing>
> >
> > > > > bugs". I>
> >
> > > > know>
> >
> > > > > of a few showstopper issues that need to be tackled before a release>
> >
> > > > > candidate can be made (mentioned in the previous email), so tagging>
> >
> > > > > a release candidate would not really make sense.>
> >
> > > > >>
> >
> > > > > I would like to repeat my call for action to test the release,>
> >
> > > > > create>
> >
> > > &g

RE: MXNet 1.6.0 release

2019-11-13 Thread Zhao, Patric

@Przemek, what's the status of 1.6 release? 

Do we have an ETA for the voting in dev@ and general@?

Thanks,

--Patric

> -Original Message-
> From: Chaitanya Bapat 
> Sent: Friday, November 8, 2019 2:19 PM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: MXNet 1.6.0 release
> 
> Thanks Przemyslaaw for leading and managing the 1.6 release!
> 
> Moreover, thanks for clarifying the difference between code-freeze and release
> candidate.
> 
> Currently, log_softmax for Large Tensor would fail. It is fixed in this PR by 
> Hao -
> https://github.com/apache/incubator-mxnet/pull/16711
> It would be great to have that cherry-picked.
> Thanks
> Chai
> 
> 
> On Thu, 7 Nov 2019 at 17:33, Zhao, Patric  wrote:
> 
> > Thanks for the great efforts.
> >
> > I think below PR need to be backported to 1.6 for bugfix in large
> > tensor supports.
> > https://github.com/apache/incubator-mxnet/pull/16737
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: Przemysław Trędak 
> > > Sent: Friday, November 8, 2019 5:46 AM
> > > To: d...@mxnet.apache.org
> > > Subject: Re: MXNet 1.6.0 release
> > >
> > > Dear MXNet Community,
> > >
> > > From talking to different Members of the Community, I realized there
> > > is a misunderstanding of what "code freeze" actually means. Let me
> > > try to
> > clear
> > > this confusion in this email.
> > >
> > > The code freeze does not mean "1.6 release is done, let's vote on it
> > > and
> > ship
> > > it as-is". As some of You probably noticed, I did not tag a RC0 yet.
> > That is
> > > because code freeze means "there are no more new features going to
> > > be accepted in order to provide stable base for finding and fixing
> > > bugs". I
> > know
> > > of a few showstopper issues that need to be tackled before a release
> > > candidate can be made (mentioned in the previous email), so tagging
> > > a release candidate would not really make sense.
> > >
> > > I would like to repeat my call for action to test the release,
> > > create
> > issues and
> > > tag me on issues which need to be prioritized for the 1.6 release,
> > > as
> > well as
> > > help fixing those issues (the fixes will be cherry-picked to 1.6.x
> > branch).
> > >
> > > Thank you
> > > Przemek
> > >
> > > On 2019/11/02 03:11:55, Przemys  aw Tr  dak 
> > > wrote:
> > > > Dear MXNet Community,
> > > >
> > > > This morning I updated the 1.6.x branch and so the code freeze is
> > > > in
> > effect.
> > > I would like to thank everyone who helped in preparing and reviewing
> > > pull requests to meet this deadline.
> > > >
> > > > Unfortunately, nightly tests do not currently pass (I created an
> > > > issue
> > about
> > > this: [1]). Another issue [2] was raised to my attention as
> > > potential
> > release
> > > blocker. Please help in fixing those issues and also tag me on other
> > issues
> > > that you believe must be fixed before release.
> > > >
> > > > Thank you
> > > > Przemek
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/issues/16704
> > > > [2] https://github.com/apache/incubator-mxnet/issues/16647
> > > >
> > > > On 2019/10/25 14:24:49, Przemys  aw Tr  dak 
> > > wrote:
> > > > > Dear MXNet Community
> > > > >
> > > > > Last night I updated 1.6.x branch to point to current master.
> > > > > The
> > code
> > > freeze is now in effect.
> > > > >
> > > > > That said, since most of the features intended for 1.6 release
> > > > > are
> > still not
> > > fully finished (a few PRs for BERT GPU performance, multiple MKLDNN
> > > PRs, multiple PRs tagged NumPy etc.) we decided to go with a "soft"
> > > code
> > freeze
> > > approach. Only the PRs that are in the scope of 1.6 release will now
> > > be accepted into 1.6.x branch. The hard code freeze is planned next
> > > week,
> > Oct
> > > 31st.
> > > > >
> > > > > While contributors of those in-scope PRs and their reviewers
> > > > > work to
> > > meet that deadline, I would like to call f

RE: BytePS-MXNet Integration

2019-11-10 Thread Zhao, Patric

I read the proposal but little technical statement about why BytePS is better 
than Horovod or other HW provided libraries.
It will be better if more technical details of BytePS can be introduced in the 
proposal.

Thanks,

--Patric

> -Original Message-
> From: Lin Yuan 
> Sent: Sunday, November 10, 2019 1:58 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: BytePS-MXNet Integration
> 
> Very interesting proposal. I have tried BytePS on some examples and did see
> better performance than Horovod. I look forward to this integration and feel
> free to let the community know if any help is needed.
> 
> Lin

RE: MXNet 1.6.0 release

2019-11-07 Thread Zhao, Patric

Thanks for the great efforts.

I think below PR need to be backported to 1.6 for bugfix in large tensor 
supports.
https://github.com/apache/incubator-mxnet/pull/16737

--Patric


> -Original Message-
> From: Przemysław Trędak 
> Sent: Friday, November 8, 2019 5:46 AM
> To: d...@mxnet.apache.org
> Subject: Re: MXNet 1.6.0 release
> 
> Dear MXNet Community,
> 
> From talking to different Members of the Community, I realized there is a
> misunderstanding of what "code freeze" actually means. Let me try to clear
> this confusion in this email.
> 
> The code freeze does not mean "1.6 release is done, let's vote on it and ship
> it as-is". As some of You probably noticed, I did not tag a RC0 yet. That is
> because code freeze means "there are no more new features going to be
> accepted in order to provide stable base for finding and fixing bugs". I know
> of a few showstopper issues that need to be tackled before a release
> candidate can be made (mentioned in the previous email), so tagging a
> release candidate would not really make sense.
> 
> I would like to repeat my call for action to test the release, create issues 
> and
> tag me on issues which need to be prioritized for the 1.6 release, as well as
> help fixing those issues (the fixes will be cherry-picked to 1.6.x branch).
> 
> Thank you
> Przemek
> 
> On 2019/11/02 03:11:55, Przemys��aw Tr��dak 
> wrote:
> > Dear MXNet Community,
> >
> > This morning I updated the 1.6.x branch and so the code freeze is in effect.
> I would like to thank everyone who helped in preparing and reviewing pull
> requests to meet this deadline.
> >
> > Unfortunately, nightly tests do not currently pass (I created an issue about
> this: [1]). Another issue [2] was raised to my attention as potential release
> blocker. Please help in fixing those issues and also tag me on other issues
> that you believe must be fixed before release.
> >
> > Thank you
> > Przemek
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/16704
> > [2] https://github.com/apache/incubator-mxnet/issues/16647
> >
> > On 2019/10/25 14:24:49, Przemys��aw Tr��dak 
> wrote:
> > > Dear MXNet Community
> > >
> > > Last night I updated 1.6.x branch to point to current master. The code
> freeze is now in effect.
> > >
> > > That said, since most of the features intended for 1.6 release are still 
> > > not
> fully finished (a few PRs for BERT GPU performance, multiple MKLDNN PRs,
> multiple PRs tagged NumPy etc.) we decided to go with a "soft" code freeze
> approach. Only the PRs that are in the scope of 1.6 release will now be
> accepted into 1.6.x branch. The hard code freeze is planned next week, Oct
> 31st.
> > >
> > > While contributors of those in-scope PRs and their reviewers work to
> meet that deadline, I would like to call for action for the rest of the MXNet
> Community to test, raise issues and fix the bugs in the release.
> > >
> > > Thank you
> > > Przemek
> > >
> > > On 2019/10/11 00:00:34, Przemys��aw Tr��dak
>  wrote:
> > > > Hi MXNet Community,
> > > >
> > > > As the 1.5.1 patch release is done (many thanks Tao!), it is time to
> prepare for the next minor release of MXNet - 1.6.0.
> > > >
> > > > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the
> release of 1.6.0. As it will be the first time for me to manage a release, Sam
> (samskalicky) and Lin (apeforest) agreed to help guiding me through the
> process.
> > > >
> > > > Thanks to Sheng there is a GitHub issue[1] listing major features that
> should go into the 1.6.0, please add any features that you want included
> there.
> > > >
> > > > That said, as we target November for the release, to accommodate for
> extensive testing and bugfixing, the code freeze date is set to October 24th
> 23:59PST. Please reach out to me as soon as possible if you feel that you will
> need an extension of that deadline for your feature.
> > > >
> > > > Sheng created a page on cwiki[2] about the release, I will populate it
> with the information and tracked issues and PRs.
> > > >
> > > > Thank you and let's make the great 1.6.0 release together!
> > > > Przemek
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > > > [2]
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan+a
> nd+Status
> > > >
> > >
> >

RE: RE: MXNet 1.6.0 release

2019-11-01 Thread Zhao, Patric

The issue is fixed by https://github.com/apache/incubator-mxnet/pull/16693

Does the last nightly build and tests  pass?

Thanks,

--Patric

> -Original Message-
> From: Zhao, Patric 
> Sent: Friday, November 1, 2019 12:13 PM
> To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> Subject: RE: RE: MXNet 1.6.0 release
> 
> Sure, I will see the issue.
> 
> > -Original Message-
> > From: Przemysław Trędak 
> > Sent: Friday, November 1, 2019 11:27 AM
> > To: d...@mxnet.apache.org
> > Subject: Re: RE: MXNet 1.6.0 release
> >
> > Hi Patric,
> >
> > Actually the nightly tests show some problems with machines not being
> > able to find libmkldnn.so.1 (see e.g. here:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/maste
> > r/4 92/pipeline). I'm not sure if this is just a problem with
> > configuration of CI machines for nightly tests, but please take a look
> > at this.
> >
> > Przemek
> >
> > On 2019/11/01 02:41:01, "Zhao, Patric"  wrote:
> > > Hi Przemek,
> > >
> > > The MKLDNN upgrade PR was merged in Oct 31.  Please double check the
> > nightly build and going forward for the release progress.
> > >
> > > Feel free to ping me if anything we can help.
> > >
> > > Thanks,
> > >
> > > --Patric
> > >
> > > > -Original Message-
> > > > From: Przemysław Trędak 
> > > > Sent: Friday, October 25, 2019 10:25 PM
> > > > To: d...@mxnet.apache.org
> > > > Subject: Re: MXNet 1.6.0 release
> > > >
> > > > Dear MXNet Community
> > > >
> > > > Last night I updated 1.6.x branch to point to current master. The
> > > > code freeze is now in effect.
> > > >
> > > > That said, since most of the features intended for 1.6 release are
> > > > still not fully finished (a few PRs for BERT GPU performance,
> > > > multiple MKLDNN PRs, multiple PRs tagged NumPy etc.) we decided to
> > > > go with a "soft" code freeze approach. Only the PRs that are in
> > > > the scope of 1.6 release will now be accepted into 1.6.x branch.
> > > > The hard code freeze is planned next week, Oct 31st.
> > > >
> > > > While contributors of those in-scope PRs and their reviewers work
> > > > to meet that deadline, I would like to call for action for the
> > > > rest of the MXNet Community to test, raise issues and fix the bugs in 
> > > > the
> release.
> > > >
> > > > Thank you
> > > > Przemek
> > > >
> > > > On 2019/10/11 00:00:34, Przemys  aw Tr  dak 
> > > > wrote:
> > > > > Hi MXNet Community,
> > > > >
> > > > > As the 1.5.1 patch release is done (many thanks Tao!), it is
> > > > > time to prepare
> > > > for the next minor release of MXNet - 1.6.0.
> > > > >
> > > > > I (ptrendx@github / ptredak@mxnet Slack) would like to manage
> > > > > the
> > > > release of 1.6.0. As it will be the first time for me to manage a
> > > > release, Sam
> > > > (samskalicky) and Lin (apeforest) agreed to help guiding me
> > > > through the process.
> > > > >
> > > > > Thanks to Sheng there is a GitHub issue[1] listing major
> > > > > features that
> > > > should go into the 1.6.0, please add any features that you want
> > > > included there.
> > > > >
> > > > > That said, as we target November for the release, to accommodate
> > > > > for
> > > > extensive testing and bugfixing, the code freeze date is set to
> > > > October 24th 23:59PST. Please reach out to me as soon as possible
> > > > if you feel that you will need an extension of that deadline for your 
> > > > feature.
> > > > >
> > > > > Sheng created a page on cwiki[2] about the release, I will
> > > > > populate it with
> > > > the information and tracked issues and PRs.
> > > > >
> > > > > Thank you and let's make the great 1.6.0 release together!
> > > > > Przemek
> > > > >
> > > > > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > > > > [2]
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Pl
> > > > an
> > > > +a
> > > > nd+Status
> > > > >
> > >

RE: RE: MXNet 1.6.0 release

2019-10-31 Thread Zhao, Patric

Sure, I will see the issue.

> -Original Message-
> From: Przemysław Trędak 
> Sent: Friday, November 1, 2019 11:27 AM
> To: d...@mxnet.apache.org
> Subject: Re: RE: MXNet 1.6.0 release
> 
> Hi Patric,
> 
> Actually the nightly tests show some problems with machines not being able
> to find libmkldnn.so.1 (see e.g. here: http://jenkins.mxnet-ci.amazon-
> ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/4
> 92/pipeline). I'm not sure if this is just a problem with configuration of CI
> machines for nightly tests, but please take a look at this.
> 
> Przemek
> 
> On 2019/11/01 02:41:01, "Zhao, Patric"  wrote:
> > Hi Przemek,
> >
> > The MKLDNN upgrade PR was merged in Oct 31.  Please double check the
> nightly build and going forward for the release progress.
> >
> > Feel free to ping me if anything we can help.
> >
> > Thanks,
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Przemysław Trędak 
> > > Sent: Friday, October 25, 2019 10:25 PM
> > > To: d...@mxnet.apache.org
> > > Subject: Re: MXNet 1.6.0 release
> > >
> > > Dear MXNet Community
> > >
> > > Last night I updated 1.6.x branch to point to current master. The
> > > code freeze is now in effect.
> > >
> > > That said, since most of the features intended for 1.6 release are
> > > still not fully finished (a few PRs for BERT GPU performance,
> > > multiple MKLDNN PRs, multiple PRs tagged NumPy etc.) we decided to
> > > go with a "soft" code freeze approach. Only the PRs that are in the
> > > scope of 1.6 release will now be accepted into 1.6.x branch. The
> > > hard code freeze is planned next week, Oct 31st.
> > >
> > > While contributors of those in-scope PRs and their reviewers work to
> > > meet that deadline, I would like to call for action for the rest of
> > > the MXNet Community to test, raise issues and fix the bugs in the release.
> > >
> > > Thank you
> > > Przemek
> > >
> > > On 2019/10/11 00:00:34, Przemys  aw Tr  dak 
> > > wrote:
> > > > Hi MXNet Community,
> > > >
> > > > As the 1.5.1 patch release is done (many thanks Tao!), it is time
> > > > to prepare
> > > for the next minor release of MXNet - 1.6.0.
> > > >
> > > > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the
> > > release of 1.6.0. As it will be the first time for me to manage a
> > > release, Sam
> > > (samskalicky) and Lin (apeforest) agreed to help guiding me through
> > > the process.
> > > >
> > > > Thanks to Sheng there is a GitHub issue[1] listing major features
> > > > that
> > > should go into the 1.6.0, please add any features that you want
> > > included there.
> > > >
> > > > That said, as we target November for the release, to accommodate
> > > > for
> > > extensive testing and bugfixing, the code freeze date is set to
> > > October 24th 23:59PST. Please reach out to me as soon as possible if
> > > you feel that you will need an extension of that deadline for your 
> > > feature.
> > > >
> > > > Sheng created a page on cwiki[2] about the release, I will
> > > > populate it with
> > > the information and tracked issues and PRs.
> > > >
> > > > Thank you and let's make the great 1.6.0 release together!
> > > > Przemek
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > > > [2]
> > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan
> > > +a
> > > nd+Status
> > > >
> >

RE: MXNet 1.6.0 release

2019-10-31 Thread Zhao, Patric

Hi Przemek,

The MKLDNN upgrade PR was merged in Oct 31.  Please double check the nightly 
build and going forward for the release progress.

Feel free to ping me if anything we can help.

Thanks,

--Patric

> -Original Message-
> From: Przemysław Trędak 
> Sent: Friday, October 25, 2019 10:25 PM
> To: d...@mxnet.apache.org
> Subject: Re: MXNet 1.6.0 release
> 
> Dear MXNet Community
> 
> Last night I updated 1.6.x branch to point to current master. The code freeze
> is now in effect.
> 
> That said, since most of the features intended for 1.6 release are still not 
> fully
> finished (a few PRs for BERT GPU performance, multiple MKLDNN PRs,
> multiple PRs tagged NumPy etc.) we decided to go with a "soft" code freeze
> approach. Only the PRs that are in the scope of 1.6 release will now be
> accepted into 1.6.x branch. The hard code freeze is planned next week, Oct
> 31st.
> 
> While contributors of those in-scope PRs and their reviewers work to meet
> that deadline, I would like to call for action for the rest of the MXNet
> Community to test, raise issues and fix the bugs in the release.
> 
> Thank you
> Przemek
> 
> On 2019/10/11 00:00:34, Przemys��aw Tr��dak 
> wrote:
> > Hi MXNet Community,
> >
> > As the 1.5.1 patch release is done (many thanks Tao!), it is time to prepare
> for the next minor release of MXNet - 1.6.0.
> >
> > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the
> release of 1.6.0. As it will be the first time for me to manage a release, Sam
> (samskalicky) and Lin (apeforest) agreed to help guiding me through the
> process.
> >
> > Thanks to Sheng there is a GitHub issue[1] listing major features that
> should go into the 1.6.0, please add any features that you want included
> there.
> >
> > That said, as we target November for the release, to accommodate for
> extensive testing and bugfixing, the code freeze date is set to October 24th
> 23:59PST. Please reach out to me as soon as possible if you feel that you will
> need an extension of that deadline for your feature.
> >
> > Sheng created a page on cwiki[2] about the release, I will populate it with
> the information and tracked issues and PRs.
> >
> > Thank you and let's make the great 1.6.0 release together!
> > Przemek
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > [2]
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan+a
> nd+Status
> >

RE: MXNet 1.6.0 release

2019-10-25 Thread Zhao, Patric

Thanks, Przemek.

We're catching up for MKL-DNN upgrade parts but currently the unstable CI is 
slowdown our development progress a lot.
Hope we can merge all PRs in next week if CI is back to work soon.

I will update to you for our progress.

Thanks,

--Patric

> -Original Message-
> From: Przemysław Trędak 
> Sent: Friday, October 25, 2019 10:25 PM
> To: d...@mxnet.apache.org
> Subject: Re: MXNet 1.6.0 release
> 
> Dear MXNet Community
> 
> Last night I updated 1.6.x branch to point to current master. The code freeze 
> is
> now in effect.
> 
> That said, since most of the features intended for 1.6 release are still not 
> fully
> finished (a few PRs for BERT GPU performance, multiple MKLDNN PRs, multiple
> PRs tagged NumPy etc.) we decided to go with a "soft" code freeze approach.
> Only the PRs that are in the scope of 1.6 release will now be accepted into 
> 1.6.x
> branch. The hard code freeze is planned next week, Oct 31st.
> 
> While contributors of those in-scope PRs and their reviewers work to meet that
> deadline, I would like to call for action for the rest of the MXNet Community 
> to
> test, raise issues and fix the bugs in the release.
> 
> Thank you
> Przemek
> 
> On 2019/10/11 00:00:34, Przemys��aw Tr��dak 
> wrote:
> > Hi MXNet Community,
> >
> > As the 1.5.1 patch release is done (many thanks Tao!), it is time to 
> > prepare for
> the next minor release of MXNet - 1.6.0.
> >
> > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the release
> of 1.6.0. As it will be the first time for me to manage a release, Sam 
> (samskalicky)
> and Lin (apeforest) agreed to help guiding me through the process.
> >
> > Thanks to Sheng there is a GitHub issue[1] listing major features that 
> > should go
> into the 1.6.0, please add any features that you want included there.
> >
> > That said, as we target November for the release, to accommodate for
> extensive testing and bugfixing, the code freeze date is set to October 24th
> 23:59PST. Please reach out to me as soon as possible if you feel that you will
> need an extension of that deadline for your feature.
> >
> > Sheng created a page on cwiki[2] about the release, I will populate it with 
> > the
> information and tracked issues and PRs.
> >
> > Thank you and let's make the great 1.6.0 release together!
> > Przemek
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > [2]
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan+and+
> Status
> >

RE: new website, docs code freeze

2019-10-08 Thread Zhao, Patric

Thanks, Thomas, it's good to have a site-wide search bar 



FYI, the similar thing in https://pytorch.org/



--Patric



> -Original Message-

> From: Thomas DELTEIL 

> Sent: Wednesday, October 9, 2019 1:41 AM

> To: dev@mxnet.incubator.apache.org

> Subject: Re: new website, docs code freeze

>

> Hi Patric,

>

> The search bar is available in the python docs:

> https://mxnet.apache.org/api/python/docs/api/ (on the top right). Since the

> homepage is not built by sphinx anymore there are no more search bar there.

> We are considering using an external plugin to maintain a site-wide index and

> provide better search experience than the sphinx one.

> btw you were asking about the mkldnn tutorials, they are now here:

> https://mxnet.apache.org/api/python/docs/tutorials/performance/backend

> /mkldnn/index.html

>

> All the best,

>

> Thomas Delteil

>

> Le lun. 7 oct. 2019 à 19:58, Zhao, Patric  a écrit :

>

> > I find there is no "search bar" in the website today.

> >

> > Could anyone check it?

> >

> > Thanks,

> >

> > --Patric

> >

> > > -Original Message-

> > > From: Thomas DELTEIL 

> > > Sent: Saturday, October 5, 2019 3:41 AM

> > > To: dev@mxnet.incubator.apache.org

> > > Subject: Re: new website, docs code freeze

> > >

> > > Hi Haibin,

> > >

> > > We are currently working with Soji on overhauling the way the python

> > > docs are organized to get better and more consistent docs with full

> > > coverage,

> > the

> > > current system is a brittle and hard to browse. We hope to finish

> > > our dev work by tonight, ETA for early next week.

> > > There is no ETA on bringing back the old docs, though that's the

> > > next

> > highest

> > > priority feature on the list after improving the coverage of the

> > > python

> > API.

> > >

> > > All the best,

> > >

> > > Thomas Delteil

> > >

> > > On Fri, Oct 4, 2019, 12:34 Haibin Lin  wrote:

> > >

> > > > Yes, that is the correct one.

> > > >

> > > > On a separate note, are we removing documentation versioning from

> > > > the website? How do we switch between the master/nightly version

> > > > and the stable version for the python API doc? Maybe there's a

> > > > switch somewhere but I cannot find it.

> > > >

> > > > Also, I find that the API doc for many methods are missing, for

> > > > example, the Dataset.transform function has detailed documentation

> > > > on input and output types, but the doc only shows the one-line

> > > > description of the method

> > > >

> > > >

> > >

> https://mxnet.apache.org/api/python/docs/api/gluon/_autogen/mxnet.gl

> > > u

> > > o

> > > > n.data.Dataset.html?highlight=dataset#

> > > > .

> > > > Same for other methods such as filter, shard, etc.

> > > >

> > > > Thanks.

> > > >

> > > > Best,

> > > > Haibin

> > > >

> > > >

> > > > On Thu, Oct 3, 2019 at 7:59 AM Aaron Markham

> > > > 

> > > > wrote:

> > > >

> > > > > Hi Haibin, you mean this one?

> > > > >

> > > > >

> > > > https://github.com/apache/incubator-

> > > mxnet/blob/master/docs/static_site

> > > > /src/pages/api/faq/distributed_training.md

> > > > > If so, it looks like a link update is needed.

> > > > >

> > > > > On Wed, Oct 2, 2019 at 9:42 PM Haibin Lin

> > > > > 

> > > > > wrote:

> > > > > >

> > > > > > I find that the 'distributed training with KVStore' tutorial

> > > > > > is

> > gone.

> > > > Are

> > > > > > we adding it back?

> > > > > >

> > > > >

> > > >

> > >

> https://mxnet.apache.org/api/python/docs/tutorials/performance/index

> > > .h

> > > > tml?highlight=distributed#distributed-training

> > > > > >

> > > > > >

> > > > > > On Tue, Oct 1, 2019 at 4:54 AM Marco de Abreu

> > > > > >  > > > >

> > > > > > wrote:

> > > > > >

> > > > > > > Thanks for the update,

RE: new website, docs code freeze

2019-10-07 Thread Zhao, Patric

 > > > > be merged but a manual update to the website has been done in
> > > > > > the meanwhile.
> > > > > > - Automated analysis from google lighthouse scoring, compared
> > > > > > to
> > the
> > > old
> > > > > > website: Performance saw a > ~100% improvement, SEO saw a ~25%
> > > > > improvement,
> > > > > > and best practices improved by ~19%. Thanks Russell D. for
> > > > > > running
> > > the
> > > > > > analysis.
> > > > > >
> > > > > > Remaining
> > > > > > - *[high priority]* API docs are still missing some classes /
> > > packages /
> > > > > > methods. ETA for fix is EOW, root cause has been identified,
> > > > > > we are
> > > still
> > > > > > deciding what's the best way forward for good discoverability
> > > > > > as
> > > well as
> > > > > > good coverage and maintainability.
> > > > > > - Adding quick links to directly access Python API docs on
> > homepage.
> > > > > >
> > > > > > All the best,
> > > > > >
> > > > > > Thomas Delteil
> > > > > >
> > > > > > Le mar. 24 sept. 2019 à 19:15, Thomas DELTEIL <
> > > thomas.delte...@gmail.com
> > > > > >
> > > > > > a
> > > > > > écrit :
> > > > > >
> > > > > > > @Philip Yes we're looking at link redirects for older links
> > > > > > > that
> > > might
> > > > > be
> > > > > > > hosted externally (using htaccess is my preferred way to
> > > > > > > handle
> > it
> > > for
> > > > > > now
> > > > > > > as you sugested) and we'll use a broken link checker to
> > > > > > > update
> > the
> > > > > links
> > > > > > > that are hosted internally. We'll update the 404 to add an
> > > explanation
> > > > > on
> > > > > > > the website update. Google indexes will slowly update across
> > > > > > > the
> > > week
> > > > > so
> > > > > > > the google search issues will be less of a problem.
> > > > > > >
> > > > > > > If you find any such links yourself, or missing tutorials,
> > > > > > > please
> > > > > > consider
> > > > > > > stepping up and helping fixing them. The more people get
> > > > > > > familiar
> > > with
> > > > > > the
> > > > > > > new website architecture, the least likely it is to fall in
> > > > > > > a
> > > state of
> > > > > > > stalled updates like the previous one.
> > > > > > >
> > > > > > > For the sphinx issues in the python mini-website, missing
> > > > > > > API
> > > classes,
> > > > > if
> > > > > > > anybody is familiar with it, I'd love for us to bring back
> > > > > > > the
> > > > > automatic
> > > > > > > doc generation for each package so at least we have a list
> > > > > > > of all
> > > > > > available
> > > > > > > classes in each sub package rather than relying on manual
> > > insertion of
> > > > > > each
> > > > > > > class, which is brittle and not future proof. @Lin, Haibin
> > > > > > >  if you have experience with it, could
> > > > > > > we
> > > sync up
> > > > > > > offline on how you suggest to do that based on your
> > > > > > > gluon-nlp
> > > > > experience?
> > > > > > >
> > > > > > > @Marco, I'm currently traveling for ICDAR in Sydney, and
> > > > > > > Aaron is
> > > on
> > > > > PTO
> > > > > > > in Europe, I'll try make time today to help with the fixes
> > > > > > > since
> > > it is
> > > > > > > impacting a lot of users.
> > > > > > >
> > > > > > > In the meanwhile, any help is appreciated, and more than the
> > value

RE: new website, docs code freeze

2019-09-22 Thread Zhao, Patric

For the install page [1], I suggest to add the selection of backend DeepNumpy 
[2] which will be more clean.

[1] http://mxnet.incubator.apache.org/index.html
[2] https://numpy.mxnet.io/#installation



> -Original Message-
> From: kellen sunderland 
> Sent: Monday, September 23, 2019 12:47 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: new website, docs code freeze
> 
> New site looks good.  I do notice that a few tutorials from the old site are
> missing (for example the TensorRT tutorial).  Any plans to bring them back?
> 
> On Sun, Sep 22, 2019 at 10:04 AM Haibin Lin 
> wrote:
> 
> > Another issue I found with the current website: the Sphinx object
> > inventory
> >  doc.org/en/master/usage/extensions/intersphinx.htm
> > l> file https://mxnet.apache.org/objects.inv is missing. GluonNLP
> > relies on this file to link document across projects. Shall we add it
> > back?
> >
> > Best,
> > Haibin
> >
> > On Sun, Sep 22, 2019 at 2:04 AM Lieven Govaerts  wrote:
> >
> > > Hi,
> > >
> > >
> > > On Sat, 21 Sep 2019 at 06:28, Thomas DELTEIL
> > > 
> > > wrote:
> > >
> > > > Thanks all for the feedback,
> > > >
> > > > We'll send an email next week with the list of missing features,
> > content
> > > > and bugs that we plan to fix.
> > > > We took the option of releasing early, with some features missing,
> > rather
> > > > than trying to be at feature parity with the old website before
> > launching
> > > > the website.
> > > > The reason why we decided to do that is two-fold:
> > > > - playing catch-up with docs in master introduce daily conflicts
> > > > that
> > > need
> > > > to be resolved and introduce opportunity for errors
> > > > - by releasing early, we can take advantage of the community
> > > contributions
> > > > in modifying whatever the community feels like a better way of
> > > > doing things.
> > > >
> > > > One of the goals of the new website was to disentangle the main
> > website,
> > > > now called "static_site" to the auto-generated docs. Now the
> > > > overall
> > site
> > > > is made of a main static site, with easy to modify content and
> > > > easy to understand architecture for anybody familiar with basic
> > > > html, and a collection of mini-websites for each language bindings
> > > > that can be
> > built
> > > in
> > > > isolation and that are self-contained. Actually the new CI jobs
> > > > builds
> > > all
> > > > of them in parallel independently.
> > > >
> > > > There is PLENTY of room for improvement, it would be great if the
> > > community
> > > > can help contribute to bring the new website at the same level of
> > content
> > > > richness as the old one, and then even further.
> > > >
> > > > Missing features:
> > > > - As pointed by Haibin, the API docs do not have the full list of
> > > operators
> > > > and classes. There is a mix of auto-generated docs based on
> > > > packages,
> > and
> > > > some docs that are spelled out manually to improve the logical
> > > organization
> > > > of the package where there is a need. The drawback with manually
> > > > listed classes in a package is that it's very easy to miss some.
> > > > If someone
> > > wanted
> > > > to build a sanity check that would automatically detect which
> > > > classes
> > are
> > > > not in the documentation, or if someone knew how to enable that
> > > > with sphinx, that would be a great addition to the python docs
> > > > - There is missing content in the python tutorials, and the
> > > discoverability
> > > > could be improved. Some old tutorials have not been migrated just yet.
> > > > - The nightly tests on tutorials have been disabled for now
> > > > - There is no "Download jupyter notebook" for tutorials just yet.
> > > > - Non-python tutorials might benefit from a blurb description and
> > > > a
> > > better
> > > > content organization.
> > > > - Python tutorials could be better organized, have a picture
> > accompanying
> > > > their description
> > > > - There is no site-wide search, this is not an easy problem to
> > > > solve to
> > > be
> > > > fair given the static nature of the website, but maybe an external
> > plugin
> > > > might be able to give a half-way solution
> > > > - There is no version selector for the docs
> > > > - There is bug in search box of the python docs, but this is just
> > > > a
> > small
> > > > JS bug that can be fixed easily (on my list for next week)
> > > > - Most old links have not had a redirect put in place.
> > > >
> > > >
> > > I noticed on the Ubuntu home page in the Developer dropdown that the
> > > link MXNet on Ubuntu
> > >  > > >with
> > > Nvidia
> > > <
> > >
> > https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/
> > mxnet/
> > > >
> > > doesn't work anymore, it points to:
> > > https://mxnet.incubator.apache.org/install/index.html
> > >
> > > Also, on the MXNet 'getting started' page
> > >

RE: [Announcement] New PPMC Member - Tao Lv

2019-09-22 Thread Zhao, Patric

Congratulations, Tao!


> -Original Message-
> From: Sheng Zha 
> Sent: Monday, September 23, 2019 12:07 PM
> To: d...@mxnet.apache.org
> Subject: [Announcement] New PPMC Member - Tao Lv
> 
> Hi all,
> 
> Please join me in welcoming Tao Lv as a new PPMC member of Apache
> MXNet (incubating)!
> 
> Tao has been a committer of our project since Nov. 2018, and has remained
> very active in not only maintaining MKLDNN backend, but many other areas.
> Over time he and the Intel team has greatly helped the project on CPU
> performance.
> 
> Welcome, Tao!
> 
> -sz

RE: new website, docs code freeze

2019-09-21 Thread Zhao, Patric

Minor suggestion:

I think we can add more in features page to attract the user and also highlight 
the differentiation of MXNet.
Something like quantization, faster inference and training, horovod supporting, 
AMP, automatic fusion in the fly... 
http://mxnet.incubator.apache.org/features

Thanks,

--Patric


> -Original Message-
> From: Thomas DELTEIL 
> Sent: Saturday, September 21, 2019 12:29 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: new website, docs code freeze
> 
> Thanks all for the feedback,
> 
> We'll send an email next week with the list of missing features, content and
> bugs that we plan to fix.
> We took the option of releasing early, with some features missing, rather
> than trying to be at feature parity with the old website before launching the
> website.
> The reason why we decided to do that is two-fold:
> - playing catch-up with docs in master introduce daily conflicts that need to
> be resolved and introduce opportunity for errors
> - by releasing early, we can take advantage of the community contributions
> in modifying whatever the community feels like a better way of doing things.
> 
> One of the goals of the new website was to disentangle the main website,
> now called "static_site" to the auto-generated docs. Now the overall site is
> made of a main static site, with easy to modify content and easy to
> understand architecture for anybody familiar with basic html, and a collection
> of mini-websites for each language bindings that can be built in isolation and
> that are self-contained. Actually the new CI jobs builds all of them in 
> parallel
> independently.
> 
> There is PLENTY of room for improvement, it would be great if the
> community can help contribute to bring the new website at the same level of
> content richness as the old one, and then even further.
> 
> Missing features:
> - As pointed by Haibin, the API docs do not have the full list of operators 
> and
> classes. There is a mix of auto-generated docs based on packages, and some
> docs that are spelled out manually to improve the logical organization of the
> package where there is a need. The drawback with manually listed classes in
> a package is that it's very easy to miss some. If someone wanted to build a
> sanity check that would automatically detect which classes are not in the
> documentation, or if someone knew how to enable that with sphinx, that
> would be a great addition to the python docs
> - There is missing content in the python tutorials, and the discoverability
> could be improved. Some old tutorials have not been migrated just yet.
> - The nightly tests on tutorials have been disabled for now
> - There is no "Download jupyter notebook" for tutorials just yet.
> - Non-python tutorials might benefit from a blurb description and a better
> content organization.
> - Python tutorials could be better organized, have a picture accompanying
> their description
> - There is no site-wide search, this is not an easy problem to solve to be 
> fair
> given the static nature of the website, but maybe an external plugin might be
> able to give a half-way solution
> - There is no version selector for the docs
> - There is bug in search box of the python docs, but this is just a small JS 
> bug
> that can be fixed easily (on my list for next week)
> - Most old links have not had a redirect put in place.
> 
> We'll formalize this in github issues next week, but they are all fairly 
> small and
> helping out on these would be a great way of familiarizing yourself with the
> new website build system and website architecture.
> 
>  Thanks all for the feedback, please keep it coming!
> 
> Thomas Delteil
> 
> Le sam. 21 sept. 2019 à 09:53, Haibin Lin  a écrit :
> 
> > It looks like my previous email did not go through. Re-sending:
> >
> > Hi Aaron,
> >
> > The website looks cool. Thanks for pushing this to production. A few
> > questions:
> >
> > - I was looking for the API doc for mx.sym.dot, but I find that most
> > operators under mx.sym.* are missing. Is this expected?
> > - I was also checking the search functionality, searching the keyword
> > "ndarray" only returns one result "mxnet.ndarray.NDArray", which
> > doesn't seem right. There animation keeps going (Searching. ->
> > Searching.. -> Searching ...) and gives me an impression that the
> > search is never completely done(?).
> >
> > Best,
> > Haibin
> >
> >
> > On Fri, Sep 20, 2019 at 4:50 PM Chaitanya Bapat 
> > wrote:
> >
> > > Thanks Aaron and the team for launching new website!
> > >
> > > 1. There's no search button anywhere on the landing page.
> > > 2. I wasn't able to find FAQ (and without search button I dont have
> > option
> > > but to go manually on each menu). Only when I go to Docs
> > > -> FAQ
> > > -> Extend and Cotribute (that I got what I wanted).
> > >
> > > Suggestions
> > > Might want to make this searchable and pop FAQ on the main page (or
> > > somewhere prominent)
> > >
> > > Thanks,
> > > Chai
> > >
> > >
> >

RE: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0

2019-09-18 Thread Zhao, Patric

+1 

Tested MKLDNN backend and everything looks great.

> -Original Message-
> From: Qing Lan 
> Sent: Wednesday, September 18, 2019 2:20 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0
> 
> +1 for Scala/Java test. Passed all tests for CPU/GPU build.
> Also tested build from source with static build.
> 
> Thanks,
> Qing
> 
> From: Tao Lv 
> Sent: Tuesday, September 17, 2019 14:14
> To: dev@mxnet.incubator.apache.org 
> Subject: [VOTE] Release Apache MXNet (incubating) 1.5.1.rc0
> 
> Dear MXNet community,
> 
> 
> 
> This is the 3-day vote to release Apache MXNet (incubating) version 1.5.1.
> 
> Voting on dev@ will start September 17, 12:00pm (PST)  and close on
> September 20, 12:00pm (PST).
> 
> 
> 
> 1) Link to release notes:
> 
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Notes
> 
> 
> 
> 2) Link to release candidate:
> 
> https://github.com/apache/incubator-mxnet/releases/tag/1.5.1.rc0
> 
> 
> 
> 3) Link to source and signatures on Apache dist server:
> 
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.1.rc0/
> 
> 
> 
> Please remember to TEST first before voting accordingly:
> 
> +1 = approve
> 
> +0 = no opinion
> 
> -1 = disapprove (provide reason)
> 
> 
> 
> Thanks,
> 
> -tao

new website (RE: CI and PRs)

2019-08-14 Thread Zhao, Patric

Hi Aaron,

Recently, we are working on improving the documents of CPU backend based on the 
current website.

I saw there're several PRs to update the new website and it's really great.

Thus, I'd like to know when the new website will online. 
If it's very near, we will switch our works to the new website.

Thanks,

--Patric


> -Original Message-
> From: Aaron Markham 
> Sent: Thursday, August 15, 2019 11:40 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: CI and PRs
> 
> The PRs Thomas and I are working on for the new docs and website share
> the mxnet binary in the new CI pipelines we made. Speeds things up a lot.
> 
> On Wed, Aug 14, 2019, 18:16 Chris Olivier  wrote:
> 
> > I see it done daily now, and while I can’t share all the details, it’s
> > not an incredibly complex thing, and involves not much more than
> > nfs/efs sharing and remote ssh commands.  All it takes is a little
> > ingenuity and some imagination.
> >
> > On Wed, Aug 14, 2019 at 4:31 PM Pedro Larroy
> >  > >
> > wrote:
> >
> > > Sounds good in theory. I think there are complex details with
> > > regards of resource sharing during parallel execution. Still I think
> > > both ways can
> > be
> > > explored. I think some tests run for unreasonably long times for
> > > what
> > they
> > > are doing. We already scale parts of the pipeline horizontally
> > > across workers.
> > >
> > >
> > > On Wed, Aug 14, 2019 at 5:12 PM Chris Olivier
> > > 
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Rather than remove tests (which doesn’t scale as a solution), why
> > > > not
> > > scale
> > > > them horizontally so that they finish more quickly? Across
> > > > processes or even on a pool of machines that aren’t necessarily the
> build machine?
> > > >
> > > > On Wed, Aug 14, 2019 at 12:03 PM Marco de Abreu <
> > marco.g.ab...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > With regards to time I rather prefer us spending a bit more time
> > > > > on maintenance than somebody running into an error that could've
> > > > > been
> > > caught
> > > > > with a test.
> > > > >
> > > > > I mean, our Publishing pipeline for Scala GPU has been broken
> > > > > for
> > quite
> > > > > some time now, but nobody noticed that. Basically my stance on
> > > > > that
> > > > matter
> > > > > is that as soon as something is not blocking, you can also just
> > > > deactivate
> > > > > it since you don't have a forcing function in an open source project.
> > > > > People will rarely come back and fix the errors of some nightly
> > > > > test
> > > that
> > > > > they introduced.
> > > > >
> > > > > -Marco
> > > > >
> > > > > Carin Meier  schrieb am Mi., 14. Aug.
> > > > > 2019,
> > > 21:59:
> > > > >
> > > > > > If a language binding test is failing for a not important
> > > > > > reason,
> > > then
> > > > it
> > > > > > is too brittle and needs to be fixed (we have fixed some of
> > > > > > these
> > > with
> > > > > the
> > > > > > Clojure package [1]).
> > > > > > But in general, if we thinking of the MXNet project as one
> > > > > > project
> > > that
> > > > > is
> > > > > > across all the language bindings, then we want to know if some
> > > > > fundamental
> > > > > > code change is going to break a downstream package.
> > > > > > I can't speak for all the high level package binding
> > > > > > maintainers,
> > but
> > > > I'm
> > > > > > always happy to pitch in to provide code fixes to help the
> > > > > > base PR
> > > get
> > > > > > green.
> > > > > >
> > > > > > The time costs to maintain such a large CI project obviously
> > > > > > needs
> > to
> > > > be
> > > > > > considered as well.
> > > > > >
> > > > > > [1] https://github.com/apache/incubator-mxnet/pull/15579
> > > > > >
> > > > > > On Wed, Aug 14, 2019 at 3:48 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > From what I have seen Clojure is 15 minutes, which I think
> > > > > > > is
> > > > > reasonable.
> > > > > > > The only question is that when a binding such as R, Perl or
> > Clojure
> > > > > > fails,
> > > > > > > some devs are a bit confused about how to fix them since
> > > > > > > they are
> > > not
> > > > > > > familiar with the testing tools and the language.
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 11:57 AM Carin Meier <
> > carinme...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Great idea Marco! Anything that you think would be
> > > > > > > > valuable to
> > > > share
> > > > > > > would
> > > > > > > > be good. The duration of each node in the test stage
> > > > > > > > sounds
> > like
> > > a
> > > > > good
> > > > > > > > start.
> > > > > > > >
> > > > > > > > - Carin
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 2:48 PM Marco de Abreu <
> > > > > > marco.g.ab...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > we record a bunch of metrics about run statistics (down
> > > > > > > > > to
> >

RE: [Discussion] MXNet 1.5.1 release

2019-08-12 Thread Zhao, Patric

Thanks for the explanation, Marco & Tao. Sounds great!

> -Original Message-
> From: Tao Lv 
> Sent: Monday, August 12, 2019 9:54 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Discussion] MXNet 1.5.1 release
> 
> > Regarding the open issue, is there default code owner/maintainer? If
> > so, he/she will be the right people to look into the issue.
> > https://github.com/apache/incubator-mxnet/blob/master/CODEOWNERS
> >
> 
> I have no idea. But the CODEOWNERS is used to receive change notificaitons,
> not actually indicates the maintainer of a piece of code.
> 
> Do we have regularly build, run, functionality and performance testing for
> > this release?
> 
> 
> As Marco mentioned, build, run and functionality of v1.5.x branch are tracked
> automatically by the CI for each cherry pick pull request and the nightly 
> tests
> here:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/activity.
> I see it's healthy so far.
> 
> For performance, Shufan will track CPU performance with his test suite and
> send out the report once the branch is frozen. I'm not sure if there are any
> other performance tests.
> 
> On Mon, Aug 12, 2019 at 9:36 PM Marco de Abreu
> 
> wrote:
> 
> > Hi Patric,
> >
> > CI should automatically pick up the branch and validate it as usual.
> >
> > Best regards,
> > Marco
> >
> > Zhao, Patric  schrieb am Mo., 12. Aug. 2019, 15:22:
> >
> > > It's great works, Tao 
> > >
> > > Regarding the open issue, is there default code owner/maintainer? If
> > > so, he/she will be the right people to look into the issue.
> > > https://github.com/apache/incubator-
> mxnet/blob/master/CODEOWNERS
> > >
> > > Do we have regularly build, run, functionality and performance
> > > testing
> > for
> > > this release?
> > >
> > > Thanks,
> > >
> > > --Patric
> > >
> > > > -Original Message-
> > > > From: Tao Lv 
> > > > Sent: Monday, August 12, 2019 8:59 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: [Discussion] MXNet 1.5.1 release
> > > >
> > > > Update:
> > > >
> > > > We're cherry picking fixes from the master to the v1.5.x branch.
> > > > Some
> > of
> > > > them are already merged. Please find details on the cwiki page:
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Pl
> > > > an+a
> > > > nd+Status
> > > >
> > > >
> > > >  There are still 3 opens:
> > > > 1. Nightly test failure on CI (
> > > > https://github.com/apache/incubator-mxnet/issues/15374): The issue
> > > > is
> > > still
> > > > open. I'm wondering if it has been fixed or not. If not, is there
> > anyone
> > > > working on it?
> > > > 2. Broken Sidebar on website API for master and 1.5.0 (
> > > > https://github.com/apache/incubator-mxnet/issues/15200): I don't
> > > > see
> > any
> > > > progress on this issue? Do we still want to include it into 1.5.1
> > > > patch
> > > release?
> > > > 3. License issues need to be fixed before 1.6 release (
> > > > https://github.com/apache/incubator-mxnet/issues/15542): Currently
> > > > the license issue for code and images is partially fixed on the
> > > > master
> > > branch and
> > > > will be picked to v1.5.x soon. MKLML license issue is pushed out
> > > > to 1.6 release. But license issue for cub and pybind is still open.
> > > >
> > > > Let me know if you any suggestion. Thanks for your support!
> > > >
> > > > -tao
> > > >
> > > >
> > > > On Wed, Aug 7, 2019 at 11:03 PM Tao Lv  wrote:
> > > >
> > > > >
> > > > > Update:
> > > > >
> > > > > Thanks to wkcn's report, Issue #15774 [1] and the fix #15751 [2]
> > > > > are added to the scope of 1.5.1 patch release.
> > > > > For issue #15703 [3], I'm still waiting from the response from
> > > > > the reporter.
> > > > > Issue #15431 [4] was closed as false positive report.
> > > > > I also included several MKL-DNN backend issues reported by mxnet
> > users
> > > > > and downstream projects. They are already fixed on the master
> branch.
> > > >

RE: [Discussion] MXNet 1.5.1 release

2019-08-12 Thread Zhao, Patric

It's great works, Tao 

Regarding the open issue, is there default code owner/maintainer? If so, he/she 
will be the right people to look into the issue.
https://github.com/apache/incubator-mxnet/blob/master/CODEOWNERS

Do we have regularly build, run, functionality and performance testing for this 
release?

Thanks,

--Patric

> -Original Message-
> From: Tao Lv 
> Sent: Monday, August 12, 2019 8:59 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Discussion] MXNet 1.5.1 release
> 
> Update:
> 
> We're cherry picking fixes from the master to the v1.5.x branch. Some of
> them are already merged. Please find details on the cwiki page:
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+a
> nd+Status
> 
> 
>  There are still 3 opens:
> 1. Nightly test failure on CI (
> https://github.com/apache/incubator-mxnet/issues/15374): The issue is still
> open. I'm wondering if it has been fixed or not. If not, is there anyone
> working on it?
> 2. Broken Sidebar on website API for master and 1.5.0 (
> https://github.com/apache/incubator-mxnet/issues/15200): I don't see any
> progress on this issue? Do we still want to include it into 1.5.1 patch 
> release?
> 3. License issues need to be fixed before 1.6 release (
> https://github.com/apache/incubator-mxnet/issues/15542): Currently the
> license issue for code and images is partially fixed on the master branch and
> will be picked to v1.5.x soon. MKLML license issue is pushed out to 1.6
> release. But license issue for cub and pybind is still open.
> 
> Let me know if you any suggestion. Thanks for your support!
> 
> -tao
> 
> 
> On Wed, Aug 7, 2019 at 11:03 PM Tao Lv  wrote:
> 
> >
> > Update:
> >
> > Thanks to wkcn's report, Issue #15774 [1] and the fix #15751 [2] are
> > added to the scope of 1.5.1 patch release.
> > For issue #15703 [3], I'm still waiting from the response from the
> > reporter.
> > Issue #15431 [4] was closed as false positive report.
> > I also included several MKL-DNN backend issues reported by mxnet users
> > and downstream projects. They are already fixed on the master branch.
> >
> > Please kindly check the full list of issues need be included in the
> > 1.5.1 patch release:
> >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Plan+a
> > nd+Status
> >
> > For issues which are already fixed on the master branch, we will start
> > to cherry pick the fix commit to the v1.5.x branch. For issues which
> > are still open, we will start to track the fix process.
> >
> > Thanks for your great support. Let me know if you have any questions
> > or concerns.
> >
> > -tao
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/15774
> > [2] https://github.com/apache/incubator-mxnet/pull/15751
> > [3] https://github.com/apache/incubator-mxnet/issues/15703
> > [4] https://github.com/apache/incubator-mxnet/issues/15431
> >
> >
> > On Tue, Aug 6, 2019 at 2:04 PM Tao Lv  wrote:
> >
> >>
> >> Per Sam's proposal [1], Issue #15737 [2] and the fix [3] are added to
> >> the scope of 1.5.1 patch release.
> >>
> >> A friendly reminder: the issue proposing will be closed before 11pm
> >> 8/7 CST (8am 8/7 PST). After that, we will start to cherry pick fixes
> >> to the v1.5.x branch.
> >>
> >>
> >> [1]
> >> https://github.com/apache/incubator-
> mxnet/issues/15613#issuecomment-5
> >> 18430120 [2] https://github.com/apache/incubator-mxnet/issues/15737
> >> [3] https://github.com/apache/incubator-mxnet/pull/15692
> >>
> >> On Thu, Aug 1, 2019 at 4:24 PM Tao Lv  wrote:
> >>
> >>> Hi Sandeep/Lai,
> >>>
> >>> Thank you for the prompt response!
> >>>
> >>> https://github.com/apache/incubator-mxnet/issues/15200  is added to
> >>> the list to track the sidebar issue.
> >>>
> >>> On Thu, Aug 1, 2019 at 7:54 AM sandeep krishnamurthy <
> >>> sandeep.krishn...@gmail.com> wrote:
> >>>
>  Thank you Tao and Shufan.
>  Sidebar missing bug in API documentation is inconvenience for the user.
>  It
>  would great if we can fix it with 1.5.1
> 
>  On Wed, Jul 31, 2019, 10:14 AM Lai Wei  wrote:
> 
>  > Hi Tao,
>  >
>  > Thank you so much for driving it.  Currently nightly test on
>  tutorials are
>  > failing and it need to be fixed. [3] I have updated the issue[1]
>  > and cwiki.[2]
>  >
>  > [1] https://github.com/apache/incubator-mxnet/issues/15613
>  > [2]
>  >
>  >
> 
> https://cwiki.apache.org/confluence/display/MXNET/1.5.1+Release+Pla
>  n+and+Status
>  > [3] https://github.com/apache/incubator-mxnet/issues/15374
>  >
>  > Best Regards
>  >
>  > Lai
>  >
>  >
>  > On Wed, Jul 31, 2019 at 8:04 AM Tao Lv  wrote:
>  >
>  > >  Hi community,
>  > >
>  > >
>  > >
>  > > Thanks for the initiative from Sam (samskalicky@github), we
>  > > already
>  > have a
>  > > discussion thread [1] on github about the defects and bugs
>  > > exposed
>  in the
>  > > 1.5.0 release.
>

RE: [Announcement] New Committer - Lai Wei

2019-08-03 Thread Zhao, Patric

Congratulation, Lai. 

Well done for the very challenge release 1.5 and you make the progress going 
smoothly 


> -Original Message-
> From: kellen sunderland 
> Sent: Sunday, August 4, 2019 9:32 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Announcement] New Committer - Lai Wei
> 
> Congrats Lai.  Well deserved.
> 
> On Sat, Aug 3, 2019, 6:18 PM Jake Lee  wrote:
> 
> > Congratulations! Lai
> >
> > On Sat, Aug 3, 2019 at 6:13 PM Sheng Zha  wrote:
> >
> > > Hi all,
> > >
> > > Please join me in welcoming Lai (Roy) Wei as a new committer of
> > > Apache MXNet (incubating)!
> > >
> > > Lai was one of the main contributor to the MXNet Keras frontend.
> > > More recently, he contributed the Gluon estimator API, which enables
> > > easy usage and better modularization for the training scripts of
> > > Gluon. He also persisted and helped driving the long-running 1.5.0
> > > release process.
> > >
> > > Welcome, Lai!
> > >
> > > -sz
> > >
> > >
> >

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2

2019-07-09 Thread Zhao, Patric

+1

Tested MXNet with MKLDNN backend for fp32/int8 inference and training coverage. 
Both functionality and performance are great 

> -Original Message-
> From: sandeep krishnamurthy 
> Sent: Wednesday, July 10, 2019 7:03 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc2
> 
> +1
> 
> 
> 
> Download source -
> https://github.com/apache/incubator-
> mxnet/releases/download/1.5.0.rc2/apache-mxnet-src-1.5.0.rc2-
> incubating.tar.gz
> 
> 
> -[Y] Are release files in correct location?
> 
> -[Y] Do release files have the word incubating in their name?
> 
> -[Y] Does DISCLAIMER file exist?
> 
> -[Y] Do LICENSE and NOTICE files exists?
> 
> -[Y] Is the LICENSE and NOTICE text correct?
> 
> -[Y] Is the NOTICE year correct?
> 
> Is there any 3rd party code contained inside the release? If so:
> 
> -[Y] Does the software have a compatible license?
> 
> -[Y] Are all software licenses mentioned in LICENSE?
> 
> -[Y] Is the full text of the licenses (or pointers to it) in LICENSE?
> 
> Is any of this code Apache licensed? Do they have NOTICE files? If so:
> 
> -[Y] Have relevant parts of those NOTICE files been added to this NOTICE
> 
> file?
> 
> -[Y] Do all source files have ASF headers?
> 
> -[Y] Do the contents of the release match with what's tagged in version
> control?
> 
> -[N] Are there any unexpected binary files in the release?
> 
> -[Y] Can you compile from source? Are the instruction clear?
> 
> 
> Apart from above checks, I built from source on a GPU machine with CUDA
> 10, using following command:
> 
> make -j32  USE_BLAS=openblas USE_CUDA=1 USE_OPENMP=1
> USE_PROFILER=1
> USE_CUDNN=1 USE_CUDA_PATH=/usr/local/cuda USE_OPENCV=1
> 
> 
> and ran operator performance test via opperf following the instructions
> ( https://github.com/apache/incubator-
> mxnet/blob/1.5.0.rc2/benchmark/opperf/README.md
> ), that executes ~150 MXNet operators. No broken functionality observed.
> 
> 
> 
> 
> On Tue, Jul 9, 2019 at 3:24 PM Lai Wei  wrote:
> 
> > +1
> > Tested the following works fine:
> >
> > 1. Built from source on OSX, Ubuntu CPU, GPU 2. Ran example/gluon
> > image classification on CPU, GPU 3. Built latest Keras-MXNet from
> > source and all tests passed.
> >
> >
> >
> > Best Regards
> >
> > Lai
> >
> >
> > On Tue, Jul 9, 2019 at 2:11 PM Qing Lan  wrote:
> >
> > > Have successfully fixed the issue on OSX.
> > >
> > > Scala/Java build is fine:
> > >
> > > osx-cpupassed (Qing)
> > > linux-cpu  passed (Zach)
> > > linux-gpu  passed (Zach)
> > >
> > > +1 for the release.
> > >
> > > Thanks,
> > > Qing
> > >
> > >
> > > 
> > > From: Qing Lan 
> > > Sent: Monday, July 8, 2019 12:47
> > > To: d...@mxnet.apache.org; dev@mxnet.incubator.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > 1.5.0.rc2
> > >
> > > Hi All,
> > >
> > > I found the problem when I tried to build from source with my Mac:
> > >
> > > clang: error: unsupported option '-fopenmp'
> > > clang: error: unsupported option '-fopenmp'
> > > make: *** [build/src/operator/nn/mkldnn/mkldnn_act.o] Error 1
> > > make: *** [build/src/operator/nn/cudnn/cudnn_batch_norm.o] Error 1
> > >
> > > I use "make -j4" with tar.gz package
> > >
> > > Thanks,
> > > Qing
> > >
> > >
> > >
> > > 
> > > From: Sheng Zha 
> > > Sent: Friday, July 5, 2019 17:42
> > > To: d...@mxnet.apache.org
> > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > > 1.5.0.rc2
> > >
> > > +1
> > >
> > > On 2019/06/27 17:05:40, Lai Wei  wrote:
> > > > Dear MXNet community,
> > > >
> > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > version
> > > 1.5.0.
> > > > Voting on dev@ will start June 26, 23:59:59(PST)  and close on
> > > > June
> > 29,
> > > > 23:59:59.
> > > >
> > > > 1) Link to release notes:
> > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+No
> > > > tes
> > > >
> > > >
> > > > 2) Link to release candidate:
> > > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.rc2
> > > >
> > > >
> > > >
> > > > 3) Link to source and signatures on apache dist server:
> > > >
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.rc2/
> > > >
> > > >
> > > >
> > > > Please remember to TEST first before voting accordingly:
> > > >
> > > > +1 = approve
> > > > +0 = no opinion
> > > > -1 = disapprove (provide reason)
> > > > --
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> > >
> >
> 
> 
> --
> Sandeep Krishnamurthy

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-26 Thread Zhao, Patric

 Pip Info---
> > > > Version  : 19.1.1
> > > > Directory: /home/piotr/mxnet_1.4/py3_venv/lib/python3.6/site-
> packages/pip
> > > > --MXNet Info---
> > > > Version  : 1.4.1
> > > > Directory: /home/piotr/mxnet_1.4/python/mxnet
> > > > Hashtag not found. Not installed from pre-built package.
> > > > --System Info--
> > > > Platform : Linux-4.15.0-1035-aws-x86_64-with-Ubuntu-18.04-bionic
> > > > system   : Linux
> > > > node : ip-172-31-63-171
> > > > release  : 4.15.0-1035-aws
> > > > version  : #37-Ubuntu SMP Mon Mar 18 16:15:14 UTC 2019
> > > > --Hardware Info--
> > > > machine  : x86_64
> > > > processor: x86_64
> > > > Architecture:x86_64
> > > > CPU op-mode(s):  32-bit, 64-bit
> > > > Byte Order:  Little Endian
> > > > CPU(s):  72
> > > > On-line CPU(s) list: 0-71
> > > > Thread(s) per core:  2
> > > > Core(s) per socket:  18
> > > > Socket(s):   2
> > > > NUMA node(s):2
> > > > Vendor ID:   GenuineIntel
> > > > CPU family:  6
> > > > Model:   85
> > > > Model name:  Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
> > > > Stepping:4
> > > > CPU MHz: 1223.344
> > > > BogoMIPS:6000.00
> > > > Hypervisor vendor:   KVM
> > > > Virtualization type: full
> > > > L1d cache:   32K
> > > > L1i cache:   32K
> > > > L2 cache:1024K
> > > > L3 cache:25344K
> > > > NUMA node0 CPU(s):   0-17,36-53
> > > > NUMA node1 CPU(s):   18-35,54-71
> > > > Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> > > > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
> > > > pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl
> > > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor ssse3
> > > > fma cx16 pcid
> > > > sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
> > > > f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single
> > > > pti fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm
> > > > mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd
> > > > avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku
> > > > ospke --Network Test--
> > > >
> > > > On Tue, Jun 25, 2019 at 2:35 PM Pedro Larroy
>  wrote:
> > > > >
> > > > > I did a training of cifar10 in CPU and seems there's some
> > > > > regressions in the range of 7% increase of training time against 
> > > > > 1.4.1:
> > > > >
> > > > > (py3_venv)
> > > > > piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench
> > > > > (master)+$ time python cifar10.py --epochs 5
> > > > > real11m30.388s
> > > > > user417m7.766s
> > > > > sys 16m57.315s
> > > > >
> > > > > VS 1.4.1:
> > > > > real10m41.994s
> > > > > user392m40.646s
> > > > > sys 12m30.601s
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2019 at 10:15 PM Lai Wei 
> wrote:
> > > > > >
> > > > > > Hi Anirudh,
> > > > > >
> > > > > > Thanks for jumping into this quickly, I followed up on the issue.
> > > > > >
> > > > > > I was meant for sockeye developer/maintainers to help setup
> > > > > > nightly tests and raise issues early.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Fri, Jun 21, 2019 at 10:10 AM Haibin Lin
> > > > > > 
> > > > > > wrote:
> > > > > >
> > > > > > > In GluonNLP we are testing with MXNET nightly build for each
> > > > > > > PR, and we did find some MXNet related issue caught by the CI.
> > > > > > > I recommend other toolkits also add integration tests with MXNet
> nightly.
> > > > > > > It helps ide

RE: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1

2019-06-20 Thread Zhao, Patric

Thanks to raise the issue and we will take a look ASAP.

The downstream cases is not in the MXNet CI so it's hard to catch the potential 
bugs or performance degradation for MXNet developers.

In the future, I suggest adding the major downstream test cases, like from 
sockeye, GluonNLP, GLuonCV, DGL, Gluon-TS, into the nightly test.
If it's still too heavy,  maybe testing it weekly or monthly :)

Thanks,

--Patric

> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Friday, June 21, 2019 9:31 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc1
> 
> Hi Lai,
> 
> I have opened an issue:
> https://github.com/apache/incubator-mxnet/issues/15297
> I came to know about this issue only today and I have not been monitoring
> sockeye.
> I jumped onto this issue to make sure it wasn't caused by the dlpack changes.
> Also, I don't  think sockeye CI checks against master, it is using 1.4.1.
> 
> Anirudh
> 
> 
> On Thu, Jun 20, 2019 at 6:17 PM Lai Wei  wrote:
> 
> > Hi,
> >
> > Could you share which test failed and what’s the crash? How to
> > reproduce it?
> >
> > I was able to install sockeye and run all tests passed. Using python
> > setup.py test
> >
> > I have tested both nightly pip package and 1.5.0.rc1
> >
> > It would be great to create an issue with reproducible steps and move
> > the discussion there.
> >
> > Also I see sockeye nightly build[1] has been failing for some time, if
> > it’s due to MXNet change, please raise this early so we can track and
> > solve it in time rather than block the release during vote time.
> >
> > [1] https://travis-ci.org/awslabs/sockeye
> >
> >
> > On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian
> >  > >
> > wrote:
> >
> > > I was able to reproduce a crash with the commit
> > > 09202f7f261954383aa387144524d38f83f18d06 but not with the commit
> > > a862270beb2d796c1ba311183f7f4a766a18ad6c.
> > >
> > > Anirudh
> > >
> > > On Thu, Jun 20, 2019 at 3:53 PM Lai Wei  wrote:
> > >
> > > > Hi Przemyslaw,
> > > >
> > > > Is there an issue with more details to track the problem?
> > > >
> > > >
> > > > On Fri, Jun 21, 2019 at 6:04 AM Przemysław Trędak
> > > > 
> > > > wrote:
> > > >
> > > > > -1
> > > > >
> > > > > There is a crash in sockeye unit test (python setup.py test)
> > > > > observed starting with nightly 1.5 build from 6/13 and still
> > > > > occuring in
> > > 1.5rc1. I
> > > > > don't yet have the exact commit that is responsible for it, but
> > > > > it is either a862270beb2d796c1ba311183f7f4a766a18ad6c (dlpack
> > > > > related) or
> > > > > 09202f7f261954383aa387144524d38f83f18d06 (cached op
> optimization).
> > > > >
> > > > > On 2019/06/20 06:36:22, Lai Wei  wrote:
> > > > > > Dear MXNet community,
> > > > > >
> > > > > > This is the 3-day vote to release Apache MXNet (incubating)
> > > > > > version
> > > > > 1.5.0.
> > > > > > Voting on dev@ will start June 19, 23:59:59(PST)  and close on
> > June
> > > > 22,
> > > > > > 23:59:59.
> > > > > >
> > > > > > 1) Link to release notes:
> > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Note
> > > s
> > > > > >
> > > > > >
> > > > > > 2) Link to release candidate:
> > > > > >
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.5.0.r
> > > > > > c1
> > > > > >
> > > > > >
> > > > > > 3) Link to source and signatures on apache dist server:
> > > > > >
> > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.5.0.r
> > > > > > c1/
> > > > > >
> > > > > >
> > > > > > Please remember to TEST first before voting accordingly:
> > > > > >
> > > > > > +1 = approve
> > > > > > +0 = no opinion
> > > > > > -1 = disapprove (provide reason)
> > > > > > --
> > > > > > Best Regards
> > > > > >
> > > > > > Lai
> > > > > >
> > > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Lai
> > > >
> > >
> > --
> > Best Regards
> >
> > Lai
> >

RE: Proposal - GPU pointwise fusion

2019-06-09 Thread Zhao, Patric

+1 for this proposal. The operator fusion is a very common skill to improve the 
efficient memory bandwidth and reduce the latency.

My suggestions:
* Flexibility
Because the fusion, especially pointwise fusion,  is backend and device 
independent. 
It's better to make the solution more flexible and doesn't need to limit to 
only GPU backend.
Different backend/device can provide the fusion code by themselves by toolchain 
or optimized kernel with the same path.
In the short term, only enable the GPU kernel is fine and I will contribute to 
add the CPU code soon.

* Reuse MXNET_SUBGRAPH_BACKEND env
There are already lots of env variable and I suggest to reuse subgraph env 
which is already well known and documented.
Such as  MXNET_SUBGRAPH_BACKEND=MKLDNN is for CPU fusion now.
https://github.com/apache/incubator-mxnet/blob/master/docs/faq/env_var.md

Questions:
*  " Introduce graph passes that look for subgraphs made of compatible 
pointwise ops and replace them with proper _FusedOp nodes."
What's the "compatible pointwise ops" and is it CDUA version or HW independent?
Does the developer need to aware their new OP compatible?

* " Fusion is guarded by MXNET_USE_FUSION environment variable. It should be 
decided what the default should be."
Any hints for the user?  
Is it possible to switch off some of fusion by user or add more?

Thanks,

BR,

--Patric



> -Original Message-
> From: Przemysław Trędak [mailto:ptre...@apache.org]
> Sent: Sunday, June 9, 2019 11:57 AM
> To: d...@mxnet.apache.org
> Subject: Proposal - GPU pointwise fusion
> 
> Hello Community,
> 
> DL models, besides compute intensive operations like convolutions and fully
> connected layers, feature a lot of simple pointwise (aka elementwise)
> operations (like elementwise addition etc.). Performance of those operations
> is fully memory bandwidth bound and so it limits speedups from newer GPU
> hardware, which typically has high compute/memory bandwidth ratio. There
> are multiple attempts (e.g. TVM) ongoing to use compiler technology in order
> to deal with this and other, harder performance problems. However,
> integration of e.g. TVM into MXNet is a long term effort and there is a need
> for a simpler, more focused, approach to deal with this problem in the
> meantime.
> 
> This proposal (design doc [1], PR [2]) attempts to be a short term solution to
> this problem - using existing NNVM backend to MXNet and without a big
> refactoring required.
> 
> Any feedback and help will be greatly appreciated.
> 
> Thank you,
> Przemek
> 
> [1]
> https://cwiki.apache.org/confluence/display/MXNET/GPU+Pointwise+fusion
> [2] https://github.com/apache/incubator-mxnet/pull/15167

RE: Context-specific operator parameters

2019-06-04 Thread Zhao, Patric

Thanks for the new proposal. 

My concern for the current proposal is that the script/code will be NOT 
portable and backward compatible, also increase the complexity for the usage, 
with such backend specific info in the operator.
Let's say if the user set the backend parameter in their script , such as conv 
algo=Winograd, precision=fp16, layout=NHWC, etc. 
This group of the parameter can get the best performance in the tested HW but 
maybe cause performance degradation and even can't be executed in different HW.
One example is from the Github issue where the `layout` parameter caused the 
error, https://github.com/apache/incubator-mxnet/issues/15079

Thus, I think we need to remove this kind of context-specific operator 
parameters, like 'cudnn_tune', 'cudnn_off`, `layout`, rather than adding more 
parameters into operator.
I suggest hiding this kind of optimization and selection to backend, maybe 
using subgraph. 

Thanks,

--Patric


> -Original Message-
> From: Dick Carter [mailto:dickjc...@apache.org]
> Sent: Tuesday, June 4, 2019 8:21 AM
> To: d...@mxnet.apache.org
> Subject: Context-specific operator parameters
> 
> MXNet has a number of context-specific operator parameters:  'cudnn_tune',
> 'cudnn_off' and 'workspace' are parameters that control the behavior of
> Convolution on gpu contexts with NVIDIA gpus.  Even with these, there
> would be benefits to having additional parameters, e.g. to  set Convolution
> algos by number, or force the compute precision to float16.  With the desire
> to support multiple backends and a growing number of operators, it's time to
> ask the question, "Is this scalable?"
> 
> I propose that, rather than adding a new parameter at the Python level for
> each new backend-specific parameter 'knob', all context-specific parameters
> be swept into a single dictionary, called e.g. 'ctx_params':
> 
> Convolution(..., ctx_params= {'cudnn_tune': 2, 'cudnn_off': False,
> 'workspace': 2000}, ...)
> 
> I'll stop short of working out all the details to hopefully generate more
> discussion.  Some open questions:
> 
> Do all backends share the same namespace, or do we have separate
> 'gpu_ctx_params', 'cpu_ctx_params', etc.?
> 
> Is there a clean extension to the general parameter parsing facility of dmlc 
> to
> handle this dictionary, and what form do these extension params take in the
> backend, Map?
> 
> And while this proposes to organize and consolidate these context-specific
> parameters at the Python level, we'd need to tolerate (and auto-create)
> documentation for these new parameters.
> 
> Other approaches welcome.

RE: [Announcement] New Committer - Yuxi Hu

2019-05-23 Thread Zhao, Patric

Congratulations, Darren :) Thanks for your great works in Horovod.

> -Original Message-
> From: Chaitanya Bapat [mailto:chai.ba...@gmail.com]
> Sent: Friday, May 24, 2019 9:46 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Announcement] New Committer - Yuxi Hu
> 
> Congratulations Darren!
> 
> On Fri, 24 May, 2019, 12:51 AM Sheng Zha,  wrote:
> 
> > Hi all,
> >
> > Please join me in welcoming Yuxi (Darren) Hu as a new committer of
> > Apache MXNet (incubating)!
> >
> > Yuxi has been one of the core contributors of Horovod integration in
> > MXNet. Along the way, he has been making meaningful contributions to
> > improve the mxnet backend, such as introducing API for engine push to
> > make it easier to integrate horovod and external operator library.
> >
> > Welcome, Darren!
> >
> > -sz
> >
> >

RE: [DISCUSS] 1.5.0 Release Plan

2019-05-23 Thread Zhao, Patric

Thanks, Lai.  

With the great helps from the community, all PRs listed in the roadmap are done 
:)
https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642

Update the status of the below list

 - [1] PR#14713 is almost done and wait for internal validation results
 - [2] PR#14893 is merged
 - [3] PR#15031 is merged
 - [7] PR#15038 new PR to fix the bug in C++ interface, will be merged soon 
after the review.

Feel free to let me know if anything our team can help :)

BR,

--Patric

> -Original Message-
> From: Lai Wei [mailto:roywei...@gmail.com]
> Sent: Thursday, May 23, 2019 6:05 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [DISCUSS] 1.5.0 Release Plan
> 
> Hi @dev,
> 
> Thanks for working hard for the 1.5 release, since there has been several
> release blockers (mostly fixed). We are extending the code freeze to Friday
> 05/22/2019. Right now we are tracking the following 5 open PRs[1][2][3][4][5]
> and 1 issue[6]. Please let us know if you need more time.
> 
> I would like to encourage all downstream projects to test with latest MXNet
> to avoid any incompatibility in the coming 1.5.0 release. If you have any
> issues that may block the release, please let us know.
> Thank you very much.
> 
> [1] https://github.com/apache/incubator-mxnet/pull/14713
> [2] https://github.com/apache/incubator-mxnet/pull/14893
> [3] https://github.com/apache/incubator-mxnet/pull/15031
> [4] https://github.com/apache/incubator-mxnet/pull/15039
> [5] https://github.com/apache/incubator-mxnet/pull/15041
> [6] https://github.com/apache/incubator-mxnet/issues/15034
> 
> 
> Best Regards
> 
> Lai
> 
> 
> On Wed, May 15, 2019 at 9:05 PM Junru Shao 
> wrote:
> 
> > Hi folks,
> >
> > Here I may have a release blocker for 1.5.0 about implementation of
> > dynamic shape mechanism, which somehow conflicts with Gluon's
> deferred
> > initialization [1].
> >
> > [1] https://github.com/dmlc/gluon-nlp/issues/706
> >
> > On Wed, May 15, 2019 at 12:09 PM Anirudh Subramanian <
> > anirudh2...@gmail.com>
> > wrote:
> >
> > > Hi Lai,
> > >
> > > From the discussion I had with Nvidia offline they are targeting on
> > pushing
> > > the required changes today.
> > > Since this is important feature for the release, if this gets
> > > delayed and cannot  be merged by 05/17/2019, the code freeze date
> > > may need to be changed.
> > >
> > > Anirudh
> > >
> > > On Wed, May 15, 2019 at 1:23 AM Lv, Tao A  wrote:
> > >
> > > > Hi dev,
> > > >
> > > > We see there are several github issues [1][2][3][4] about mxnet
> > > > windows build experience. The team is working intensively
> > > > [5][6][7] on that to
> > > fix
> > > > some problems of MKL-DNN build on windows. We hope these fixes
> can
> > catch
> > > > the code freeze and finally enter the 1.5.0 release.
> > > >
> > > > The PR against mshadow (#374) was already merged and MXNet PR
> > > > #14877 is under review - great thanks to CI team for helping on
> > > > the MKL
> > > installation
> > > > request. PR #14952 is document change according to build logic
> > > > changes
> > in
> > > > PR #14877. So I think these two PRs should be merged simultaneously.
> > > > Currently #14877 is experiencing a CI response problem.
> > > >
> > > > Please take your time to have a look at these two PRs. Your
> > > > comments
> > and
> > > > suggestions are highly appreciated.
> > > >
> > > > Thanks,
> > > > -tao
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/issues/14670
> > > > [2] https://github.com/apache/incubator-mxnet/issues/14335
> > > > [3] https://github.com/apache/incubator-mxnet/issues/14203
> > > > [4] https://github.com/apache/incubator-mxnet/issues/14085
> > > > [5] https://github.com/apache/incubator-mxnet/pull/14877
> > > > [6] https://github.com/dmlc/mshadow/pull/374
> > > > [7] https://github.com/apache/incubator-mxnet/pull/14952
> > > >
> > > > -Original Message-
> > > > From: Lai Wei [mailto:roywei...@gmail.com]
> > > > Sent: Wednesday, May 15, 2019 2:57 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: [DISCUSS] 1.5.0 Release Plan
> > > >
> > > > Hi Anirudh,
> > > >
> > > > I see there was an offline disucssion <
> > > >
> > >
> > https://github.com/apache/incubator-
> mxnet/pull/14173#pullrequestreview
> > -235846341
> > > > >
> > > > and I have updated the AMP feature and your project on the release
> > > tracker
> > > > <
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Plan+a
> > nd+Status
> > > > >
> > > > ,
> > > > Please let me know if you have any updates.
> > > >
> > > > Hi @dev,
> > > > This is a gentle reminder that  the code freeze for 1.5.0 release
> > > > is on 05/17/2019, please let us know if you have any WIP pull
> > > > requests aiming
> > > for
> > > > 1.5.0 that needs attention.
> > > > Please understand we already have around 650 commits in master
> > > > that
> > need
> > > > to be released in time. We understand TensorRT test in CI is
> > > > failing
> > and

RE: [Announcement] New Committer - Zhennan Qin

2019-04-30 Thread Zhao, Patric

Cong, Zhennan.

Really great works and it makes the MXNet/Quantization flow outstanding over 
the world!

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Tuesday, April 30, 2019 11:01 PM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: [Announcement] New Committer - Zhennan Qin
> 
> Congratulations Zhennan!
> 
> -Original Message-
> From: Jun Wu [mailto:wujun@gmail.com]
> Sent: Tuesday, April 30, 2019 12:29 PM
> To: dev@mxnet.incubator.apache.org
> Subject: [Announcement] New Committer - Zhennan Qin
> 
> Please join me in welcoming Zhennan Qin (https://github.com/ZhennanQin)
> from Intel as a new committer.
> 
> Zhennan is the main author of accelerating MXNet/MKLDNN inference
> through operator fusion and model quantization. His work has placed MXNet
> in an advantageous place for inference workloads on Intel CPUs compared
> with other DL frameworks.

RE: [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType and memory planning pass

2019-04-09 Thread Zhao, Patric

BTW,  "maintainability, testability and readability"  is always our design goal 
from starting point of MKL-DNN integration :)

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Wednesday, April 10, 2019 11:03 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType and
> memory planning pass
> 
> 
> Thank you Tianqi and Sam for the kind suggestions.
> 
> @Tianqi,
> 
> Can you please point me to the code of this pass or do you think anyone
> from TVM community can help to educate me on this? I'm very happy to
> learn from that.
> 
> Just one note, we are not only doing layout transformation but also want to
> have more memory for layout transformation.
> For example, (N=32, C=3, H=256, W=256) will be padded to (N=32, C=16,
> H=256, W=256) on channel dimension then convert (N=32, C=16, H=256,
> W=256) to nchw16c so we can leverage corresponding optimal computation
> kernels.
> That's why we also need changes to the memory planning pass.
> 
> 
> @Sam,
> 
> Yes, definitely we're treating MKL-DNN as an accelerator on CPU. Previously
> we used it to accelerate certain critical operators in MXNet in certain
> situations, eg. FP32 convolution/deconvolution/fullyConnected, etc. But
> along with the evolving of both MXNet and MKL-DNN, we started to do more
> which might not supported by MXNet in original CPU implementation, such
> as quantization and graph fusion. So MKL-DNN backend is also changing from
> a simple `accelerator` to a `default` backend on CPU. And I totally agree with
> you that we need think more about the software architecture for
> maintainability, testability and readability - that's why I sent out this 
> proposal
> to get more ideas from the community.
> 
> 
> -tao
> 
> -Original Message-
> From: Skalicky, Sam [mailto:sska...@amazon.com.INVALID]
> Sent: Wednesday, April 10, 2019 2:24 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [MXNET 2.0 Wishlist] [DISCUSS] Refine the InferStorageType and
> memory planning pass
> 
> I agree with Tianqi. We should let MKLDNN partitipate in memory planning
> by first having a separate NNVM pass and then using that info in the regular
> memory planning phase.
> 
> Its starting to sound like MKLDNN should be treated like an accelerator rather
> than an operator library. As it has explicit needs and can provide 
> acceleration
> when given extra capabilities in MXNet like having input to the memory
> planning NNVM pass. It also has special tensor formatting needs and
> conversions that could be best architected in another way than they
> currently are.
> 
> We need to think about how we want to architect this for maintainability,
> testability, and readability.
> 
> Sam
> 
> 
> > On Apr 9, 2019, at 11:11 AM, Tianqi Chen 
> wrote:
> >
> > The layout transformation should really be a separate optimization
> > pass rather than memory planning. As is done in the TVM stack. If we
> > want to do a clean slate solution, I would recommend looking into that
> instead.
> >
> > TIanqi
> >
> > On Tue, Apr 9, 2019 at 1:46 AM Lv, Tao A  wrote:
> >
> >>
> >>
> >> Hi dev,
> >>
> >>
> >>
> >> As we're discussing the roadmap for MXNet 2.0, I would like to start
> >> a thread about refining the InferStorageType and memory planning pass
> >> in MXNet and hope it can happen as a part of the 2.0 release.
> >>
> >>
> >>
> >> Thanks to @eric-haibin-lin, part of the proposal has already been
> >> discussed in issue #13598 [1].
> >>
> >>
> >>
> >> As mentioned in the description of issue #13598, there are several
> >> drawbacks of the existing flow. Please allow me to quote them here:
> >> *the selection of MKL/CPU/GPU/CUDNN implementation happens
> after
> >> graph attribute inference and memory planning, memory planning is
> >> thus not aware of the implementation that will be used for execution
> >> in the future, which may result in sub-optimal result. For example,
> >> the memory inplace option may vary depending on the accelerator
> >> backend (the new version of CUDNN enables x/dx inplace for
> _backward_conv).
> >> *some sparse operator need to access dtype/shape information to
> >> decide which implementation to invoke for execution, and whether to
> >> perform fallback. This information is not yet exposed in the existing
> >> infer storage type interface.
> >>
> >>
> >>
> >> Besides, the existing memory planning pass calculates and afterwards
> >> allocates memory strictly according to the input/output tensor shapes
> >> (which can be got from operators' arithmetic formulas through
> InferShape).
> >> That's not true anymore when we come to accelerators like MKL-DNN on
> >> CPU which wants to pad input/output tensor to optimal formats (eg.
> >> nchw16c) according to hardware architecture. It also can be described
> >> as shape + stride. As many of you know, MKL-DNN shows great
> >> performance on these optimal formats which is blocked by the vector
> length

RE: assimilation of mshadow into the MXNet codebase

2019-04-07 Thread Zhao, Patric

Agree.

Recently, we (Tao, Shufan, Pengxin) are trying to integrate the Intel MKL math 
functions into mshadow and MXNet. 
We have to go through two repos and make lots of tradeoff between them. 
If we can move mshadow into MXNet, it will be more flexible to redesign and 
refactor parts of legacy code.

> -Original Message-
> From: Sheng Zha [mailto:zhash...@apache.org]
> Sent: Monday, April 8, 2019 5:48 AM
> To: d...@mxnet.apache.org
> Subject: Re: assimilation of mshadow into the MXNet codebase
> 
> mshadow depends on *a* BLAS library, and there's nothing inherent in
> mshadow code base that requires OpenBLAS over MKL. The linked issue
> #11769 seems to be more of a build logic issue.
> 
> -sz
> 
> On 2019/04/07 18:56:43, Aaron Markham 
> wrote:
> > +1
> > Reduced complexity. Choice of math library... Hopefully you can just
> > install MKL and not be forced into mshadow's dependency on OpenBLAS.
> > This could make Windows setup easier.
> > Maybe this issue will get fixed: #11769.
> >
> > On Sun, Apr 7, 2019, 00:51 Junru Shao  wrote:
> >
> > > Does merging mshadow into mxnet bring any actual benefit for
> > > customers in sense of performance, portability, or anything else?
> > >
> > > On Fri, Apr 5, 2019 at 9:38 PM Tianqi Chen
> > > 
> > > wrote:
> > >
> > > > Technically, mshadow is sufficient for MXNet. Adopting other
> > > > libraries ( eigen or xtensor) will unnecessarily increase the
> > > > codebase complexity without any additional gains.
> > > >
> > > > Given that mshadow is only used by mxnet. I do support donating it
> > > > into mxnet codebase.
> > > > To respect the original mshadow community. I would recommend
> > > > starting a community RFC In the mshadow github issue for a week,
> > > > before we start the migrating process.
> > > > Also, I would recommend a rebase merge just like the case of
> > > > MXNet.jl
> > > code
> > > > base to preserve the contribution history.
> > > >
> > > > Tianqi
> > > >
> > > >
> > > > On Fri, Apr 5, 2019 at 9:25 PM Alfredo Luque
> > > >  wrote:
> > > >
> > > > > Do you have a link to both of these proposals?
> > > > >
> > > > > On Fri, Apr 5, 2019 at 20:14 Anirudh Acharya
> > > > > 
> > > > > wrote:
> > > > >
> > > > > > Hi Pedro,
> > > > > >
> > > > > > mshadow is mostly used for tensor arithmetic. There have been
> > > > discussions
> > > > > > about including it within mxnet. I think it is a good idea.
> > > > > >
> > > > > > As a more long term solution using libraries like eigen to
> > > > > > perform
> > > > linear
> > > > > > algebra operations was also suggested by anirudh2290@. I think
> > > > xtensor(
> > > > > > https://github.com/QuantStack/xtensor ) can also be a
> > > > > > candidate
> > > here.
> > > > > >
> > > > > > -
> > > > > > Anirudh
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 5, 2019 at 7:03 PM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > Some developers have noticed that working in mshadow is
> > > > > > > cumbersome
> > > as
> > > > > > > it's a 3rdparty subrepo.
> > > > > > >
> > > > > > > Since mshadow is a bunch of headers which don't have much of
> > > > > > > independent tests / library functionality, me and other
> > > > > > > developers believe that it would be good to assimilate this
> > > > > > > code in the repository for ease of contribution and changes
> > > > > > > without having to
> > > go
> > > > > > > trough contortions to test PRs that modify mshadow.
> > > > > > >
> > > > > > > Would anybody oppose this change?
> > > > > > >
> > > > > > > Thanks and have a nice weekend.
> > > > > > >
> > > > > > > Pedro.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

RE: [MXNET 2.0 Wishlist] [DISCUSS] Single build system

2019-04-05 Thread Zhao, Patric

+1 single build system.



> -Original Message-
> From: Qing Lan [mailto:lanking...@live.com]
> Sent: Friday, April 5, 2019 5:27 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [MXNET 2.0 Wishlist] [DISCUSS] Single build system
> 
> +1 to have a single build system
> 
> Currently the way of publish and the way of doing CI test is very different.
> The instruction shown on the website should match the way we deliver it to
> the users.
> Having a single build process would simplify the maintainance cost and reach
> to the best performance.
> 
> Thanks,
> Qing
> 
> 
> From: Marco de Abreu 
> Sent: Thursday, April 4, 2019 15:01
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [MXNET 2.0 Wishlist] [DISCUSS] Single build system
> 
> +1 towards having a single build system
> 
> I'd like to add the benefit of this approach allowing us to have the same 
> build
> logic across all operating systems. It would be great if we could make
> x86/Unix, x86/windows, x86/mac and ARM/Unix first class citizens from the
> beginning.
> 
> -Marco
> 
> Kellen Sunderland  schrieb am Do., 4. Apr. 2019, 12:31:
> 
> > Hello MXNet devs,
> >
> > I'd like to start a thread discussing what our build system should
> > look like in MXNet 2.0.  I'd propose that although the current make
> > system has served us well in the past, we remove it along with the
> > bump to 2.0.  The end goal I'd like to see is that we have a clean
> > build system, without a bunch of conditional logic that makes
> > contributing and testing MXNet a simpler process.  Additionally I'd
> > propose we target a minimum cmake version of 3.7 for reasons described
> below.
> >
> > First I'd like to give some context on why I'd propose we don't just
> > switch to cmake, but we also target a relatively new version (version
> > 3.7 from Nov, 2016) of cmake.  The largest benefits in making this
> > change would apply to CUDA builds where cmake itself has quite
> > inconsistent functionality between versions.  One persistent annoyance
> > I've had with cmake is that we've had conditional logic for the
> > FindCUDA command which at one point targeted some modern cmake
> > features, but then in subsequent versions of cmake the way these
> > features works was tweaked, and now I find these cmake features are
> > consistently broken to the point that I require a bunch of -D defines
> > to compile properly or to use an IDE.  An additional CUDA related
> > issue is that every time there's a new SM added to NVCC we have to
> > make a few source changes to support it.  I could see this being
> > problematic for users who may suddenly realize that due to their
> > compilation settings, they may not actually be enabling the features they
> think they are with their shiny new GPUs.
> >
> > As an alternative if we, for example, target cmake 3.7 at a minimum,
> > and we want to find cuda and then build a list of reasonable PTX/BINS
> > we could use the following command[1]:
> >
> > 
> > FindCUDA(...)
> > ...
> > CUDA_SELECT_NVCC_ARCH_FLAGS(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0)
> Maxwell)
> >   LIST(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})
> > 
> >
> > Simple, concise, and it would help to make the building experience
> > more consistent across platforms, build environments and IDEs (looking
> > at you CLion).  We'd of course need to do a little experimentation
> > work to make sure that this does indeed work as intended, and can
> > replace the currently complex findCuda logic we have in our build
> > systems, but for the sake of the proposal let's assume these cmake
> > commands do indeed work consistently as documented from cmake 3.7
> onwards.
> >
> > To give users a chance to update their tooling I'd also suggest we
> > begin warning users at least a release in advance that make based
> > builds will be deprecated in MXNet 2.0 so they can begin migrating to
> > cmake.  I'd also want to display deprecation messages for unused cmake
> > flags (such as the profiler flag) for a release before the 2.0
> > release, and then remove them in 2.0.
> >
> > Of course not all users have cmake 3.7 on their systems, some of our
> > employers force use to use ridiculously outdated linux distributions.
> > The good news for these users is that if we can offer Docker
> > compilation with an image that has a supported version of cmake and we
> > should be able to build a portable binary that work even with very old
> > distributions of Linux.  Additionally installing cmake from source is
> > also fairly straightforward [2] and works quite well on older distros in my
> experience.
> >
> > Looking forward to hearing what others think.  Any preferred build
> > systems that you all would want to use?  Is cmake the right system to
> > centralize on?  If so, is version 3.7 a reasonable minimum version to
> > target?  Is the
> > 2.0 release a good point at which we can think about simplifying build
> > logic?
> >
> > 1:

RE: Call for Ideas and Approaches to Community Building

2019-03-18 Thread Zhao, Patric

Regarding the marketing, one example from the GTC, which is ongoing at San 
Jose, US now.

https://gputechconf2019.smarteventscloud.com/connect/search.ww#loadSearch-searchPhrase==session=0=dayTime=

There are total 850 sessions and I search the keyword of MXNet, Pytorch and 
Tensorflow.
The session with MXNet is still very little though there are lots of very great 
features and advantages from MXNet.

I suggest to encourage and fund the students/researchers to present their works 
on the popular conference.
I know talking is easy but maybe the decision maker can allocate more resources 
for marketing.


MXNet 7 times, about 0.8%
https://gputechconf2019.smarteventscloud.com/connect/search.ww#loadSearch-searchPhrase=MXNet=session=0=dayTime=

Pytorch 22 times, 2%
https://gputechconf2019.smarteventscloud.com/connect/search.ww#loadSearch-searchPhrase=pytorch=session=0=dayTime=

Tensorflow 46 times, 5%
https://gputechconf2019.smarteventscloud.com/connect/search.ww#loadSearch-searchPhrase=tensorflow=session=0=dayTime=

--Patric

> -Original Message-
> From: Lin Yuan [mailto:apefor...@gmail.com]
> Sent: Monday, March 18, 2019 5:15 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Call for Ideas and Approaches to Community Building
> 
> Zach,
> 
> Thanks for joining in the mxnet project and your very thoughtful discussion.
> We do have virtual hangout/meetups. Please refer to
> https://cwiki.apache.org/confluence/display/MXNET/Meetups+and+Hangou
> ts
> 
> I also strongly agree with your 4). I think we should have a clear roadmap on
> our wiki page and/or github repo.
> 
> Again, welcome on board!
> 
> Lin
> 
> 
> On Sun, Mar 17, 2019 at 7:33 AM Zhao, Patric 
> wrote:
> 
> > Very great points!
> >
> > +1 for 4) and 5)
> >
> >
> > > -Original Message-
> > > From: Zach Boldyga [mailto:z...@scalabull.com]
> > > Sent: Sunday, March 17, 2019 8:33 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: Call for Ideas and Approaches to Community Building
> > >
> > > This is a great discussion, thanks for opening, Carin!
> > >
> > > As a newcomer to MXNet and Apache communities in general, I’ve been
> > > considering what I can bring to the table here, and what importance
> > > it
> > would
> > > have to me.
> > >
> > > I'm not employed by large organizations, and communities like this
> > > are perhaps the only way to be involved in projects of such a large
> > > scale and importance. An opportunity to join this type of team
> > > without the full commitment of employment is fantastic! I see
> > > potential for this to be a
> > form
> > > of validation, a chance to meet others and build professional
> > relationships,
> > > and a vehicle to learn from some of the most well-educated people in
> > > the industry.
> > >
> > > That said, here’s what I’ve noticed thus far:
> > >
> > > 1. There is a healthy amount of activity in Github Issues, and the
> > committers
> > > are doing a great job at allowing newcomers to jump in. I was able
> > > to get started on my first ticket within 10 minutes of searching thru 
> > > issues.
> > >
> > > 2. The dev mailing list is a great place to discuss all of the
> > > nuances
> > of the
> > > project. I also like meeting people and it would be rewarding to get
> > > to
> > know
> > > people in the community via Skype or in-person meetups! This doesn’t
> > > have to be for everyone, and I don’t think it’s appropriate for Q,
> > > but for
> > some
> > > people a social element purely for the sake of putting names with
> > > faces
> > can
> > > be rewarding. I’m open to virtual meetups :)
> > >
> > > 3. My first commit was smooth. When approaching the second one, I’m
> > > hitting some hiccups. For instance, I recently created a JIRA ticket
> > based on a
> > > Github Issue some users reported, and the ticket has been sitting
> > > for a
> > week
> > > without any activity. Should I just dig in and open a PR? How do the
> > > commiters decide what can and can’t reasonably go into the project?
> > > We may be able to make some changes to the contribution
> > > documentation or processes to make it easier for first time
> > > contributors to ramp-up into
> > regular
> > > contributors?
> > >
> > > 4. I would love to see more discussion about the future of MXNet. I
> > imagine
> > > those who have been involved in the project f

RE: Call for Ideas and Approaches to Community Building

2019-03-17 Thread Zhao, Patric

Very great points!   

+1 for 4) and 5)


> -Original Message-
> From: Zach Boldyga [mailto:z...@scalabull.com]
> Sent: Sunday, March 17, 2019 8:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Call for Ideas and Approaches to Community Building
> 
> This is a great discussion, thanks for opening, Carin!
> 
> As a newcomer to MXNet and Apache communities in general, I’ve been
> considering what I can bring to the table here, and what importance it would
> have to me.
> 
> I'm not employed by large organizations, and communities like this are
> perhaps the only way to be involved in projects of such a large scale and
> importance. An opportunity to join this type of team without the full
> commitment of employment is fantastic! I see potential for this to be a form
> of validation, a chance to meet others and build professional relationships,
> and a vehicle to learn from some of the most well-educated people in the
> industry.
> 
> That said, here’s what I’ve noticed thus far:
> 
> 1. There is a healthy amount of activity in Github Issues, and the committers
> are doing a great job at allowing newcomers to jump in. I was able to get
> started on my first ticket within 10 minutes of searching thru issues.
> 
> 2. The dev mailing list is a great place to discuss all of the nuances of the
> project. I also like meeting people and it would be rewarding to get to know
> people in the community via Skype or in-person meetups! This doesn’t have
> to be for everyone, and I don’t think it’s appropriate for Q, but for some
> people a social element purely for the sake of putting names with faces can
> be rewarding. I’m open to virtual meetups :)
> 
> 3. My first commit was smooth. When approaching the second one, I’m
> hitting some hiccups. For instance, I recently created a JIRA ticket based on 
> a
> Github Issue some users reported, and the ticket has been sitting for a week
> without any activity. Should I just dig in and open a PR? How do the
> commiters decide what can and can’t reasonably go into the project? We
> may be able to make some changes to the contribution documentation or
> processes to make it easier for first time contributors to ramp-up into 
> regular
> contributors?
> 
> 4. I would love to see more discussion about the future of MXNet. I imagine
> those who have been involved in the project for a long time have thoughts
> about next major steps, but as an outsider I’m not sure where to find this
> information. The roadmap on Github is fairly short-term and outdated, and
> lots of interesting ideas are sprouting in projects like TF Swift as of 2019.
> 
> 5. Something I’ve observed across many Apache projects: there isn’t much
> focus on marketing. I wonder why? A tool like Tensorflow is reaching 10x
> more people, mainly because of marketing.
> 
> Best,
> 
> Zach Boldyga
> Scalabull  |  Founder
> 1 (866) 846-8771 x 101
> 
> 
> On Thu, Mar 7, 2019 at 5:38 AM Tianqi Chen 
> wrote:
> 
> > what happens (also) happens in the mail-list.
> >
> > If there is a certain things or person’s contribution is only known by
> > colleagues, it is a indication of things that should be improved
> > toward more apache way.
> >
> > Tianqi
> >
> > On Thu, Mar 7, 2019 at 4:42 AM Isabel Drost-Fromm 
> > wrote:
> >
> > > On Wed, Mar 06, 2019 at 10:03:57PM -0800, Steffen Rochel wrote:
> > > > I agree with Tianqi on "One approach toward building a more
> > > > diverse community is to acknowledge the fact that we want to
> > > > encourage
> > > interactions
> > > > in the Apache way beyond our physical cycle." However, I disagree
> > > > with
> > > his
> > > > suggestion regarding "One principle to toward that is to encourage
> > > > PMC members only nominate committers from other organizations" for
> > > > the following reasons: [...]
> > >
> > > I spent quite some time digging remembering that a similar topic had
> > > been discussed somewhere at the ASF at some point in time with many
> > > whys, pros and cons towards contributor employer diversity - finally
> > > found a long and winding thread there:
> > >
> > >
> > >
> >
> https://lists.apache.org/thread.html/7a7412316ddbe1d43f5fb3d3703ea25a6
> >
> b26e56de602e27e175785c0@1337815698@%3Cgeneral.incubator.apache.or
> g%3E
> > >
> > >
> > > There is one answer in there from Roy Fielding which has a similar
> > > story to the one that you are describing, Steffen. My main takeaway
> > > of what was discussed back then: "Diversity is only a warning sign
> > > that means we need to check for decisions made in our forums and
> > > advise accordingly."
> > >
> > > The questions I personally tend to ask myself: How easy is it to
> > > follow
> > the
> > > project from just subscribing to it's mailing lists (remember the
> > > "if it didn't happen on the mailing list, it didn't happen"), get
> > > active, get involved, be treated as a fellow project member and be
> > > voted in as committer and PMC member.
> > >
> > > For a more condensed text on the topic of

RE: [Announcement] New Committer - Patric Zhao

2019-03-15 Thread Zhao, Patric

I am very glad to have this opportunity to contribute to the Apache/MXNet 
community :)

Thanks all of the supports from the community and Intel.

BR,

--Patric


> -Original Message-
> From: MiraiWK WKCN [mailto:w...@live.cn]
> Sent: Friday, March 15, 2019 12:52 AM
> To: dev@mxnet.incubator.apache.org; patric zhao 
> Subject: Re: [Announcement] New Committer - Patric Zhao
> 
> Welcome Peng Zhao!
> Peng is the AI Tech Leader in Intel Corporation. We have good cooperation
> before. He is very professional and contribute a lot to MXNet, especially deep
> learning boost on CPU.
> 
> 
> From: Anirudh Subramanian 
> Sent: Thursday, March 14, 2019 3:54:50 PM
> To: dev@mxnet.incubator.apache.org; patric zhao
> Subject: [Announcement] New Committer - Patric Zhao
> 
> Hi all,
> 
> Please join me to welcome Patric Zhao as a new committer of Apache
> (incubating) MXNet!
> 
> Patric has put in great effort around MKLDNN integration into MXNet and has
> been involved in features like quantization, graph fusion and fused RNN
> operators for CPU.
> 
> Dev List activity:
> https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:patric.zhao
> 
> Issues:
> https://github.com/apache/incubator-
> mxnet/issues?utf8=%E2%9C%93=is%3Aissue+involves%3Apengzhao-intel+
> 
> PR Reviews:
> https://github.com/apache/incubator-
> mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Apengzhao-intel
> 
> Proposals involved in:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> https://cwiki.apache.org/confluence/display/MXNET/Fused+RNN+Operators
> +for+CPU
>  zation+and+Quantization+based+on+subgraph+and+MKL-DNN>
> 
> 
> Thanks,
> Anirudh

RE: [Announcement] New Committer - Kan Wu (@wkcn)

2019-02-18 Thread Zhao, Patric

Congratulation! 

We have the cooperation with Kan before and he is easy to communicate and very 
professional :)

It's really deserved!

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Tuesday, February 19, 2019 2:17 PM
> To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> Cc: Anirudh Subramanian ; Jackie Wu
> 
> Subject: RE: [Announcement] New Committer - Kan Wu (@wkcn)
> 
> Congratulations Kan! You're well deserved!
> 
> -Original Message-
> From: Sheng Zha [mailto:szha@gmail.com]
> Sent: Tuesday, February 19, 2019 2:10 PM
> To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> Cc: Anirudh Subramanian ; Jackie Wu
> 
> Subject: [Announcement] New Committer - Kan Wu (@wkcn)
> 
> Hi,
> 
> Please join me in welcoming Kan Wu (@wkcn), as a new committer!
> 
> Kan has brought many valuable contributions to MXNet [1]. He also enriches
> the MXNet ecosystem with his operator toolkit MobulaOP.
> 
> We are excited to have Kan join us as a committer.
> 
> -sz
> 
> [1]
> https://github.com/apache/incubator-
> mxnet/pulls?utf8=%E2%9C%93=is%3Apr+author%3Awkcn+
> [2] https://github.com/wkcn/MobulaOP

RE: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-12 Thread Zhao, Patric

Update, the issue is fixed and the new patch release is out MKL-DNN 0.17.4.

Tao filed a PR to update the MKLDNN version in the release 1.4.X
https://github.com/apache/incubator-mxnet/pull/14141

Thanks all of your helps :)

--Patric


> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Tuesday, February 5, 2019 11:53 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Lv, Tao A ; Ye, Jason Y 
> Subject: RE: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2
> 
> Hi Sheng,
> 
> Thanks to raise this important issues. Sorry for the lack of validation since 
> we
> don't have mac machine with earlier OS version in house.
> 
> I will contact with MKL-DNN team for the supports of earlier versions of OSX
> but I'm a little afraid the fix needs some extra-time.
> 
> Alternatively, several workarounds in my thoughts (I know it's not the perfect
> solution):
> 
> * using LLVM which can work crossing HW/OS generation
> https://github.com/apache/incubator-
> mxnet/blob/master/MKLDNN_README.md#2
> 
> * provide the binary build for different HW/OS like cuda, mxnet-cu90/92
> 
> * disable MKLDNN supports for earlier versions of HW/OS in MAC, only
> mxnet build.
> 
> I will update the status when get the feedback and schedule from MKL-DNN
> team.
> 
> Feel free to let us know if anything we can help.
> 
> Thanks,
> 
> --Patric
> 
> 
> > -Original Message-
> > From: Sheng Zha [mailto:szha@gmail.com]
> > Sent: Tuesday, February 5, 2019 10:33 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version
> > 1.4.0.rc2
> >
> > Also, recent MKLDNN upgrade prevents us from offering binary
> > distribution for earlier versions of OSX, as it now requires OSX
> > 10.13. This means we would need to drop the binary distribution
> > support for OSX 10.11 and 10.12 if we are to keep mkldnn as a
> > dependency for mxnet-mkl. I'm inquiring whether Intel could extend the
> > compatibility to earlier OSX [1], but even if this is solved upstream it 
> > would
> require an update on the mkldnn submodule.
> >
> > -sz
> >
> > [1] https://github.com/intel/mkl-dnn/issues/405
> >
> > On Mon, Feb 4, 2019 at 3:47 PM Anirudh Subramanian
> > 
> > wrote:
> >
> > > -0
> > >
> > > Thanks Steffen for your release efforts !
> > >
> > > Build from source works with make but fails with cmake for me.
> > >
> > >  cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON
> > > -DUSE_OPENMP=ON -DCMAKE_BUILD_TYPE=Debug -
> > DUSE_DIST_KVSTORE=0
> > > -DUSE_OPENCV=1 -GNinja .. && ninja -v
> > >
> > > FAILED: : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0 -
> > msse2
> > > -std=c++11 -fopenmp -g  -pthread
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_lockfree.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_param.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_parser.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_array_view.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_any.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_config.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_threaditer.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_serializer.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_threaditer_exc_handling.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_inputsplit.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_logging.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
> > > te
> > > st_json.cc.o
> > >
> > > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unit
&g

RE: Third-party package tests for MXNet nightly builds

2019-02-11 Thread Zhao, Patric

Agree to track the 3rd party packages which make MXNet more prosperous :)

Before building the CI, I suggest to create the related labels, like sockeye, 
gluonCV, gluonNLP, etc, in the GitHub and give the high priority for these 
issues/PR.
So the issue/PR can be fixed quickly and  these important applications would 
not be blocked again.

We can help for the performance/backend/operator related issues as well :)

Thanks,

--Patric 



> -Original Message-
> From: Chance Bair [mailto:chanceb...@gmail.com]
> Sent: Monday, February 11, 2019 11:28 PM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Third-party package tests for MXNet nightly builds
> 
> Hi Felix,
> 
> Thank you for the request!  The CI team is currently working on improving
> our benchmarking platform and will evaluate this request carefully.
> 
> Chance Bair
> 
> 
> 
> On Mon, Feb 11, 2019 at 3:59 PM Carin Meier 
> wrote:
> 
> > Can't speak for the CI team, but in general I think that it is good idea.
> >
> > On a separate note, I've been playing around with Sockeye recently and
> > it's great! Awesome work and glad to see MXNet used for such cutting
> > edge use cases.
> > I'd love to see closer collaboration with the Sockeye team and MXNet
> > for innovation, cross pollination, and evangelization of what MXNet can
> do .
> >
> > Best,
> > Carin
> >
> > On Mon, Feb 11, 2019 at 6:01 AM Felix Hieber 
> > wrote:
> >
> > > Hello dev@,
> > >
> > >
> > >
> > > I would like to ask around whether there is interest in the
> > > community to test nightly builds of MXNet with third-party packages
> > > that depend on
> > MXNet
> > > and act as early adopters. The goal is to catch regressions in MXNet
> > early,
> > > allowing time for bug fixes before a new release is cut.
> > >
> > >
> > >
> > > For example, Sockeye  is a
> > > customer
> > of
> > > new MXNet releases and aims to upgrade to latest MXNet as soon as
> > possible.
> > > Typically, we update our dependency on MXNet once a new release
> > > becomes available (through pip). However, there have been cases
> > > where new
> > releases
> > > of MXNet introduced regressions undetected by MXNet tests (hence
> > > passing the release process): the latest example is this issue
> > > , which may
> > > have been introduced already back in October, but, due to infrequent
> > > MXNet releases, has only surfaced recently and will most likely
> > > force us to
> > wait
> > > for a post or 1.4.1 release. In this particular example, Sockeye’s
> > > tests would have detected this, and the issue could have been
> > > created already
> > in
> > > October, potentially avoiding its presence in the 1.4.0 release.
> > >
> > >
> > >
> > > More generally, I think there are several third-party packages with
> > > valuable test suites (e.g. gluon-nlp) that can contribute to
> > > catching
> > MXNet
> > > regressions or incompatibilities early. Running these test suites
> > > for
> > each
> > > and every PR or commit on the MXNet main repo would be too much
> overhead.
> > > My proposal would be to trigger these tests with the nightly builds
> > > (pip
> > > releases) of MXNet in a separate CI pipeline that is able to notify
> > > the
> > 3p
> > > maintainers in a case of failure, but does not block MXNet
> > > development
> > (or
> > > nightly build releases) in any way.
> > >
> > > Roughly it would do the following:
> > >
> > >- pip install mxnet--
> > >- for each 3p package that is part of the pipeline:
> > >   - clone/setup up package
> > >   - run unit/integration tests of package with some timeout
> > >   - in case of failure, notify package owner
> > >
> > >
> > >
> > > I am not familiar with the current CI pipelines, their requirements
> > > and resources. It would be great if someone from the CI team could
> > > chime in
> > and
> > > evaluate whether such a proposal seems doable and worthwhile.
> > >
> > >
> > >
> > > Best,
> > >
> > > Felix
> > >
> >

RE: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2

2019-02-04 Thread Zhao, Patric

Hi Sheng,

Thanks to raise this important issues. Sorry for the lack of validation since 
we don't have mac machine with earlier OS version in house.

I will contact with MKL-DNN team for the supports of earlier versions of OSX 
but I'm a little afraid the fix needs some extra-time.

Alternatively, several workarounds in my thoughts (I know it's not the perfect 
solution):

* using LLVM which can work crossing HW/OS generation 
https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md#2

* provide the binary build for different HW/OS like cuda, mxnet-cu90/92

* disable MKLDNN supports for earlier versions of HW/OS in MAC, only mxnet 
build.

I will update the status when get the feedback and schedule from MKL-DNN team.

Feel free to let us know if anything we can help.

Thanks,

--Patric


> -Original Message-
> From: Sheng Zha [mailto:szha@gmail.com]
> Sent: Tuesday, February 5, 2019 10:33 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.4.0.rc2
> 
> Also, recent MKLDNN upgrade prevents us from offering binary distribution
> for earlier versions of OSX, as it now requires OSX 10.13. This means we
> would need to drop the binary distribution support for OSX 10.11 and 10.12
> if we are to keep mkldnn as a dependency for mxnet-mkl. I'm inquiring
> whether Intel could extend the compatibility to earlier OSX [1], but even if
> this is solved upstream it would require an update on the mkldnn submodule.
> 
> -sz
> 
> [1] https://github.com/intel/mkl-dnn/issues/405
> 
> On Mon, Feb 4, 2019 at 3:47 PM Anirudh Subramanian
> 
> wrote:
> 
> > -0
> >
> > Thanks Steffen for your release efforts !
> >
> > Build from source works with make but fails with cmake for me.
> >
> >  cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON
> > -DUSE_OPENMP=ON -DCMAKE_BUILD_TYPE=Debug -
> DUSE_DIST_KVSTORE=0
> > -DUSE_OPENCV=1 -GNinja .. && ninja -v
> >
> > FAILED: : && /usr/bin/c++   -Wall -Wno-unknown-pragmas -fPIC -g -O0 -
> msse2
> > -std=c++11 -fopenmp -g  -pthread
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_lockfree.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_param.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_parser.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_array_view.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_any.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_config.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_threaditer.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_serializer.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_threaditer_exc_handling.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_inputsplit.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_logging.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_json.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_optional.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_main.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_env.cc.o
> >
> > 3rdparty/dmlc-core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unitte
> > st_thread_group.cc.o -o
> > 3rdparty/dmlc-core/test/unittest/dmlc_unit_tests  -rdynamic
> > lib/libgtestd.a 3rdparty/dmlc-core/libdmlc.a -lpthread && :
> >
> > 3rdparty/dmlc-
> core/test/unittest/CMakeFiles/dmlc_unit_tests.dir/unittest_logging.cc.o:
> > In function `Logging_basics_Test::TestBody()':
> >
> > /home/ubuntu/experimentals/1.4_release/build/../3rdparty/dmlc-
> core/test/unittest/unittest_logging.cc:19:
> > undefined reference to `testing::internal::DeathTest::Create(char
> > const*, testing::internal::RE const*, char const*, int,
> > testing::internal::DeathTest**)'
> > collect2: error: ld returned 1 exit status
> >
> >
> > Anirudh
> >
> > On Mon, Feb 4, 2019 at 3:09 PM Haibin Lin 
> > wrote:
> >
> > > +1 built from source on Linux and passed dist sync kvstore test.
> > >
> > > On Mon, Feb 4, 2019 at 9:54 AM Lin Yuan  wrote:
> > >
> > > > +1 build from source on MacOS 10.13.6 and tested mxnet-to-coreml
> > > converter.
> > > >
> > > > On Mon, Feb 4, 2019 at 9:03 AM Indhu 
> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Build from source and tested few examples from the examples folder.
> > > > >
> > > > > Thanks,
> > > > > Indu
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 1, 2019 at 6:21 PM Steffen Rochel <
> > steffenroc...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Sheng - thanks for the

RE: Taxonomy on our cwiki

2019-01-19 Thread Zhao, Patric

+1, Good idea. 

It's not very easy to find out the related contents since lots of folders in 
the website.


> -Original Message-
> From: Sheng Zha [mailto:zhash...@apache.org]
> Sent: Saturday, January 19, 2019 3:28 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Taxonomy on our cwiki
> 
> Hi MXNet,
> 
> Given that currently cwiki is the only place other than mxnet website for
> mxnet-related documentation, I'd like to request your attention to the
> (slightly disorganized) cwiki page of MXNet. The top level folders (and their
> contents) currently looks like this:
> - Design Proposals* (bag of proposals, not in order)
> - Development* (mixture of guides, roadmaps, processes)
> - Release Process (release notes)
> - Website (guides and proposals)
> - MXNet Clojure (call for contribution, guides)
> - MXNet Keras Integration (design)
> - MXNet-ONNX Integration (design, dev status)
> - MXNet R Package (guide, backlog)
> - MXNet-Scala (design, dev status, guide)
> - Content Formatting Templates (not a folder but link to two docs)
> - How-to articles (1 guide)
> - Community (guide on apache-related processes)
> - Data IO (designs)
> - Continuous Integration (guides, designs)
> - Meetups and Hangouts (events)
> 
> And here are two good examples from successful Apache projects:
> - Apache Flink: an **audience-oriented** structure [1]
>   Users (Presentations and How-to)
>   Contributors (Dev processes and How-to)
>   Committers (Infra, Dev processes, Release processes, Releases)
>   Roadmaps and Feature Designs (archive)
> - Apache OpenNLP: a **content-oriented** structure [2]
>   Guides
>   External Resources
>   Proposals
>   Releasing
> 
> Clean organization helps content discovery and saves time on locating useful
> content. Given that we have good amount of content on the wiki page, I
> suggest that we decide on a cleaner taxonomy, re-organize contents
> accordingly, and add future contents accordingly. To provide a starting point
> for the discussion, I suggest:
> - Given the state we are in, start with content-oriented organization, use
> these top-level categories: Guides (including processes and how-tos),
> Development (including designs, proposals, notes, roadmaps), Community
> (including events, activities, external resources and contents)
> - If people strongly prefer audience-oriented structure, later we can adopt a
> structure similar to Flink's.
> 
> Feel free to share your thoughts and preferences here. Thanks.
> 
> -sz
> 
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Homehttp
> s://cwiki.apache.org/confluence/display/FLINK/Apache+Flink+Home
> [2] https://cwiki.apache.org/confluence/display/OPENNLP/Index

RE: Design proposal - MXNet end to end models - Models with data transformations

2019-01-16 Thread Zhao, Patric

+1 for this great proposal. 

MXNet will be more flexible and portable with this new feature :)

Thanks,

--Patric


> -Original Message-
> From: sandeep krishnamurthy [mailto:sandeep.krishn...@gmail.com]
> Sent: Thursday, January 17, 2019 8:47 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Design proposal - MXNet end to end models - Models with data
> transformations
> 
> Hello Community,
> 
> Me along with fellow MXNet contributors (Jake
> , Karan ) are
> working on the following problem:
> 1. Some of the data transformations used in training is applicable during
> inference. Most commonly transformations on validation data is same as
> transformations required during inference.
> 2. MXNet models do not contain data transformations as part of the graph.
> Making it harder, time consuming and duplicated effort to re create data
> transformation during inference. This problem is more evident in cross
> language use cases. Training in Gluon (Python) and inference in Java/C++.
> 
> After few initial discussions with some of MXNet contributors (Zhi
> , Naveen ,
> Sina ), design proposal, development plan,
> tasks, milestones and more details are captured in this document.
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+end+to+end+
> models
> 
> Please do provide your feedback via comments in the document or on this e-
> mail. All contributions are welcome. I will be creating JIRA stories and 
> issues
> for initial tasks identified.
> 
> --
> Sandeep Krishnamurthy

[ANNOUNCE] MKLDNN becomes the default CPU backend in Apache/MXNet master branch

2019-01-11 Thread Zhao, Patric

Dear all,

I am pleased to announce that the MKLDNN is the default CPU backend in the 
master branch for the Linux platform now.
(note: the nightly build and release doesn't change)

Really thanks to the great supports and joint works from the community.

Feedbacks are highly appreciated :)

Related links:

1.   Integration design: 
https://cwiki.apache.org/confluence/display/MXNET/The+design+of+MKLDNN+integration

2.   Performance and accuracy: 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking

3.   MKLDNN README: 
https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md

Thanks,

--Patric

RE: [Annoucement] New Committer -- Da Zheng

2018-12-17 Thread Zhao, Patric

Congratulation, Da! 

Really thanks for your great supports and looking forward the more cooperation 
together :)

> -Original Message-
> From: Tianqi Chen [mailto:tqc...@apache.org]
> Sent: Tuesday, December 18, 2018 1:02 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [Annoucement] New Committer -- Da Zheng
> 
> Dear Community:
> 
> Please join me to welcome Da Zheng as a new committer of the MXNet.
> 
> Da is the main author of MKL-DNN integration and recently he champions
> the control flow support. He is one of the few "explorer style" contributors
> of the community, who we desperately need in this fast change environment
> of the deep learning system landscape.
> 
> PRs https://github.com/apache/incubator-mxnet/commits?author=zheng-da
> reviews  *https://github.com/apache/incubator-
> mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Azheng-da+
>  mxnet/pulls?utf8=%E2%9C%93=is%3Apr+reviewed-by%3Azheng-da+>*
> dev@  https://lists.apache.org/list.html?d...@mxnet.apache.org:lte=3y:da-
> zheng
> 
> Tianqi

RE: Include MKLDNN into default mxnet pip package

2018-12-10 Thread Zhao, Patric

+1, thanks for the efforts, Alex.



> -Original Message-
> From: Alex Zai [mailto:aza...@gmail.com]
> Sent: Tuesday, December 11, 2018 8:00 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Include MKLDNN into default mxnet pip package
> 
> Continuation from the following thread:
> https://lists.apache.org/thread.html/bcb1bd5046ff51049a0556098e756578f
> 6fa6564831d77fddb56432f@%3Cdev.mxnet.apache.org%3E
> 
> I am also +1 for making it on master and testing until 1.5.0. We can decide
> later on (before 1.5.0) to enable mkldnn as default for the nightly build (pip
> install --pre build) to try to get more feedback if needed.
> 
> - What the story is like when there's no AVX instructions present on CPUs.
> Do we get an illegal instruction error, or does it fallback gracefully?
> According to this issue (
> https://github.com/apache/incubator-mxnet/issues/11911), AVX2 is the
> minimum requirement for pre-build binaries.
> 
> - Are there any outstanding issues when MKLDNN is enabled?
> -There is one issues with quantization int8 of mkldnn (will create issue about
> it when team gives me reproducible code snippet). Additionally, we are
> waiting to merge the PR to build mkldnn statically with mac/linux when
> building from source after MKL is added to the CI.
> 
> 
> - MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases? If not we should move to releases before we make it a default I
> agree. We should tag mxnet only to releases from now on. Currently it is
> tagged to 0.17.1
> 
> Please let me know if there any other outstanding issues, else we are going
> to make mkldnn / cmake default in the Make/CMakefile.
> 
> Alex

RE: Apache MXNet v1.4.0 release status

2018-12-08 Thread Zhao, Patric

Hi Steffen,

I saw the draft of 1.4 release notes in here 
(https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28incubating%29+1.4.0+Release+Notes).

Is this near the final version?  I'd like to add some descriptions of new 
quantization features enabled in 1.4.

Is it OK?

Thanks,

--Patric


> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Saturday, December 8, 2018 1:12 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Apache MXNet v1.4.0 release status
> 
> Dear MXNet community -
> I would like to provide update on v1.4.0 status, details are tracked here
>  ncubating%29+1.4.0+Release+Plan+and+Status>
> .
> 
> Thank you very much for everybody effort to resolve the identified issues.
> We are down to 3 open issues - for details please see
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
> cubating%29+1.
> 4.0
> +Release+Plan+and+Status#ApacheMXNet(incubating)1.4.0ReleasePlanandSt
> atu
> +Release+Plan+and+s-OpenPRstotrack
>  ncubating%29+1.4.0+Release+Plan+and+Status#ApacheMXNet(incubating)1.
> 4.0ReleasePlanandStatus-OpenPRstotrack>
> Please help to resolve the remaining issues and integrate to v1.4.x branch.
> Current estimate to address the identified security vulnerabilities in the
> Scala/Java package and merge into v1.4.x branch is end of next week
> (December 14th) I will communicate as soon I have more information.
> 
> Regards,
> Steffen

RE: LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

2018-11-28 Thread Zhao, Patric

MKL-DNN v0.17.1 is released https://github.com/intel/mkl-dnn/tree/v0.17.1

I have submitted the PR to pin this release version.

Thanks,

--Patric

> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Wednesday, November 28, 2018 8:07 PM
> To: dev@mxnet.incubator.apache.org
> Subject: LSTM regression (was RE: Include MKLDNN into default mxnet pip
> package)
> 
> Hi Anirudh,
> 
> The LSTM performance bug is fixed by MKL-DNN and PR  in here
> (https://github.com/apache/incubator-mxnet/pull/13417).
> 
> I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in
> 1 or 2 days.
> 
> Will update the status soon.
> 
> Thanks everyone.
> 
> --Patric
> 
> > -Original Message-
> > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > Sent: Tuesday, November 27, 2018 6:16 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Tao,
> >
> > I agree with Steffen that we can start with a stable release for
> > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide
> > info on what versioning mechanism MKLDNN uses. Once a MKLDNN
> release
> > is out and there are some regressions found like the LSTM regression,
> > would it be possible to do a patch release for it or maintain a release
> branch for it ?
> >
> > Anirudh
> >
> > On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
> >
> > > Hi Steffen,
> > >
> > > I think all the commits on MKL-DNN master branch are well tested for
> > > MKL-DNN development team. If we really want to have a release commit
> > > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> > >
> > > Thank you,
> > > Tao
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > > 
> > > wrote:
> > > >
> > > > +1 to make MKL-DNN default.
> > > > I'm tracking
> > > > https://github.com/apache/incubator-mxnet/issues/13369
> > > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > > move to a model to include released
> > > dependencies
> > > > instead of just taking bleeding edge snapshots.
> > > > However, speed of development is important as well.
> > > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > > development
> > > > team provide us with a well tested tag/commit id to include in
> > > > 1.4.0 release?
> > > > Steffen
> > > >
> > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> > wrote:
> > > >>
> > > >> Thanks for the information, Kellen and Naveen.
> > > >>
> > > >> Better than onnx-tensorrt, MKL-DNN has already provided
> > > >> versioning and release tags. My concern is that as MKL-DNN is
> > > >> still under intensive development, if it has a new feature or bug
> > > >> fix on its master branch,
> > > do we
> > > >> really want to wait for next release to get it supported in MXNet?
> > > >>
> > > >> Take the LSTM regression as an example, probably MKL-DNN will
> > > >> give a fix or improvement on its master branch soon, do we need
> > > >> to wait for 0.18 release to get it fixed for mxnet user? AFAIK,
> > > >> tensorflow is also using normal commit id, not release, as the
> > > >> dependency for MKL-
> > DNN.
> > > >>
> > > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > > >> rather than github issues to track the defects of MKL-DNN. But I
> > > >> agree with
> > > you,
> > > >> we need update the progress of it in Alex's issue.
> > > >>
> > > >> Thanks,
> > > >> -tao
> > > >>
> > > >> -Original Message-
> > > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > >> Sent: Thursday, November 22, 2018 10:55 AM
> > > >> To: dev@mxnet.incubator.apache.org
> > > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > > >>
> > > >> Agree with your point about other repos also not being based on
> > > versioning
> > > >> Tao.  I would point out that I've given some that I've worked
> > > >> with
> &g

RE: Include MKLDNN into default mxnet pip package

2018-11-28 Thread Zhao, Patric

+1 for making MKL-DNN default in master branch first for broad testing :)

 My suggestion is to make MKL-DNN default on the master branch
 firstly after 1.4.0 releasing branch is cut off. That will help MKL-DNN backend
 to be widely used and tested by MXNet users who are building MXNet from
 source. It will also help to expose issues of MKL-DNN backend in the next
 releasing cycle. We can decide whether to make it default in pip package for
 1.5.0 release according to the feedback from the community. For 1.4.0
 release, we can still have MKL-DNN in the mxnet-mkl package.

> -Original Message-
> From: Zai, Alexander [mailto:alex...@amazon.com.INVALID]
> Sent: Thursday, November 29, 2018 4:06 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> Thanks for answering Tao. I would like to add that we have the env flag that
> disables MKLDNN operators if regression occurs.
> 
> On 11/28/18, 6:05 AM, "Lv, Tao A"  wrote:
> 
> Hi Hagay, thank you for bringing these questions together. I also
> summarized my opinions here for you easy to check.
> 
> - Make MKL-DNN default in MXNet pip package
> [Tao]: My suggestion is to make MKL-DNN default on the master branch
> firstly after 1.4.0 releasing branch is cut off. That will help MKL-DNN 
> backend
> to be widely used and tested by MXNet users who are building MXNet from
> source. It will also help to expose issues of MKL-DNN backend in the next
> releasing cycle. We can decide whether to make it default in pip package for
> 1.5.0 release according to the feedback from the community. For 1.4.0
> release, we can still have MKL-DNN in the mxnet-mkl package.
> 
> - What the story is like when there's no AVX instructions present on CPUs.
> Do we get an illegal instruction error, or does it fallback gracefully?
> [Tao]: MKL-DNN has optimizations for every ISA starting with SSE4.2 and
> there is a list for those platforms which are officially supported by MKL-DNN:
> https://github.com/intel/mkl-dnn#system-requirements. It should fallback if
> AVX is not supported. Most of computation intensive kernels in MKL-DNN are
> JITed. So they are supposed to generate code according to the platform
> during runtime and should not have any illegal instruction. For non-JIT code
> in MKL-DNN, same as other code in MXNet, it will generate instructions
> according to the options/flags of compiler. We can set -DARCH_OPT_FLAGS
> when build MKL-DNN to avoid optimization for compiling machine. That's
> exactly what we are doing for MKL-DNN build in MXNet. Even without MKL-
> DNN, I noticed there were issues about illegal instructions of MXNet when
> users import the pip package on a lower end machine which probably only
> supports SSE.
> 
> - Are there any outstanding issues when MKLDNN is enabled?
> [Tao]: I don’t know any at this time except the LSTM regression which
> hopefully will be fixed soon. I notice the fix has been pushed to MKL-DNN
> master branch. But if we decide to depend on release version only, we need
> wait for the release process of MKL-DNN finishing. If anyone knows other
> issues about MKL-DNN backend, feel free to let me know. :)
> 
> - MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases? If not we should move to releases before we make it a default
> [Tao]: I don't have strong resistance to release version. But if you want 
> to
> make a rule for MXNet that a submodule should depend on a release version,
> please take all the submodules into consideration. For MKL-DNN, my
> concern is: If the master (development) branch of MXNet relies on a bleeding
> edge commit from MKL-DNN master branch, when MXNet comes to release,
> we need revert many changes in MXNet if MKL-DNN will not have a new
> release at that time, since we need fallback the dependency to a previous
> release version. That might mess up or slow down the development and
> release of MXNet. To avoid that, we always need negotiate with MKL-DNN
> team for the release pace before every release. Please propose a solution
> for this situation and make a plan how to apply it to all submodules.
> 
> - MKLDNN versioning mechanism
> [Tao]: Copied MKL-DNN manager’s words here:
> "That's valid request and I would expect that as the software matures
> more and more applications will rely on stable versions. I would expect that
> for MXNet there is a stable branch that would rely on stable MKL-DNN and
> development branch that would rely on master.
> MKL-DNN relies on semantic versioning. We do maintain a release
> branches in addition to master that can be used to release patches. In
> particular we are planning v0.17.1 this week to deliver a fix for reorders 
> that
> you requested. This works in the following way:
> * master contains the latest development (typically the next release)
> * rls-v0.17 contains v0.17 and will be used to create minor releases
> (v0.17.1 and

LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

2018-11-28 Thread Zhao, Patric

Hi Anirudh,

The LSTM performance bug is fixed by MKL-DNN and PR  in here 
(https://github.com/apache/incubator-mxnet/pull/13417).

I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in 1 or 
2 days.

Will update the status soon.

Thanks everyone.

--Patric

> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> Hi Tao,
> 
> I agree with Steffen that we can start with a stable release for MKLDNN for
> 1.4.0. For your suggestion on using 0.17, can you provide info on what
> versioning mechanism MKLDNN uses. Once a MKLDNN release is out and
> there are some regressions found like the LSTM regression, would it be
> possible to do a patch release for it or maintain a release branch for it ?
> 
> Anirudh
> 
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
> 
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for
> > MKL-DNN development team. If we really want to have a release commit
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > 
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 1.4.0
> > > release?
> > > Steffen
> > >
> > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> wrote:
> > >>
> > >> Thanks for the information, Kellen and Naveen.
> > >>
> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning
> > >> and release tags. My concern is that as MKL-DNN is still under
> > >> intensive development, if it has a new feature or bug fix on its
> > >> master branch,
> > do we
> > >> really want to wait for next release to get it supported in MXNet?
> > >>
> > >> Take the LSTM regression as an example, probably MKL-DNN will give
> > >> a fix or improvement on its master branch soon, do we need to wait
> > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow
> > >> is also using normal commit id, not release, as the dependency for MKL-
> DNN.
> > >>
> > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > >> rather than github issues to track the defects of MKL-DNN. But I
> > >> agree with
> > you,
> > >> we need update the progress of it in Alex's issue.
> > >>
> > >> Thanks,
> > >> -tao
> > >>
> > >> -Original Message-
> > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > >> Sent: Thursday, November 22, 2018 10:55 AM
> > >> To: dev@mxnet.incubator.apache.org
> > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>
> > >> Agree with your point about other repos also not being based on
> > versioning
> > >> Tao.  I would point out that I've given some that I've worked with
> > similar
> > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> > >>
> > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy
> 
> > wrote:
> > >>>
> > >>> Tao,
> > >>>
> > >>> You are right there are many submodules in 3rd party. We have to
> > >>> start somewhere and I believe this one is a good candidate to start
> with.
> > >>> This is not to cater to release of MXNet or to tie them with the
> > >>> releases of the submodules but instead to pick only stable
> > >>> releases and not to pick up bleeding edge commits from the tip of
> > >>> the master, this gives us confidence in the submodule that MXNet
> > >>> users are depending on that especially if we make MKLDNN the default.
> > >>>
> > >>> Good to know it is known already as a regression.Alex has created
> > >>> this issue https://github.com/apache/incubator-mxnet/issues/13369,
> > >>> please add details and link the corresponding issue in MKLDNN(I
> > >>> couldn't
> > find).
> > >>>
> > >>> -Naveen
> > >>>
> >  On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A 
> wrote:
> > 
> >  Here are my answers for the questions from Kellen and Naveen
> >  about MKL-DNN. It doesn't mean that I'm supportive for making
> >  MKL-DNN default here.
> > 
> >  @Kellen,
> > 
> >  FYI, here is a list for those platforms which are officially
> >  supported by MKL-DNN.
> >  https://github.com/intel/mkl-dnn#system-requirements
> > 
> >  Most of computation intensive kernels in MKL-DNN are JITed. So
> >  they are supposed to generate code according to the platform
> >  during runtime. For non-JIT code in MKL-DNN,

RE: [Anouncement] New Committer: Tao Lv

2018-11-26 Thread Zhao, Patric

Congratulation, Tao.  

> -Original Message-
> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> Sent: Tuesday, November 27, 2018 11:17 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Anouncement] New Committer: Tao Lv
> 
> Welcome Tao!
> 
> On Mon, Nov 26, 2018 at 7:13 PM Sheng Zha  wrote:
> 
> > We are pleased to announce Tao Lv as a new committer of Apache MXNet.
> > Tao's sustained contribution to the project has been greatly helping
> > the CPU performance of MXNet.
> >
> > Please join me to welcome Tao to the team!
> >
> > -sz
> >

RE: MKLDNN performance in CI

2018-11-22 Thread Zhao, Patric

Thanks, it should be the most time-consuming parts.

@Marco, could you try to disable this env and see the performance again? 

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Friday, November 23, 2018 10:26 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> I think yes, except the cpp test.
> 
> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Friday, November 23, 2018 10:06 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> Good point, Tao!
> Is this env enabled in all MKL-DNN CI?
> 
> > -Original Message-
> > From: Lv, Tao A [mailto:tao.a...@intel.com]
> > Sent: Friday, November 23, 2018 9:53 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: RE: MKLDNN performance in CI
> >
> > Thanks for bringing this up, Marco. It's really weird since most of
> > those tests listed in "worth noting" are not related to mkldnn backend.
> >
> > I can understand that some tests for mkldnn operator may be slower
> > because MXNET_MKLDNN_DEBUG is enabled in the CI:
> > https://github.com/apache/incubator-
> > mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> >
> > -Original Message-
> > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> > Sent: Friday, November 23, 2018 9:22 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: MKLDNN performance in CI
> >
> > Hello,
> >
> > I have noticed that our Python tests have been increasing in duration
> recently.
> > In order to analyse this further, I created the PR [1] which allows to
> > record test durations. Please note that I did not dive deep on these
> > numbers and that they have to be taken with a grain of salt since
> > slaves have varying resource utilizations.
> >
> > Please have a look at the two following logs:
> > Python3 CPU MKLDNN:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > validation/pipelines/unix-cpu/branches/PR-
> > 13377/runs/2/nodes/155/steps/409/log/?start=0
> > Python3 CPU Openblas:
> > http://jenkins.mxnet-ci.amazon-
> > ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> > validation/pipelines/unix-cpu/branches/PR-
> > 13377/runs/2/nodes/152/steps/398/log/?start=0
> >
> > If you scroll to the end (note that there are multiple test stages and
> > summaries being printed in these logs), you will find the following
> > statements:
> >
> > Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> > Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> >
> > This shows that the MKLDNN is generally being about 40% slower than
> > the Openblas backend. If we go into the details, we can see that some
> > tests are significantly slower:
> >
> > Python3 CPU MKLDNN:
> >
> > >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> > >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> > >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> > >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> > >test_operator.test_pick: 74.4041s [success] 2.39%
> > >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> > >test_random.test_negative_binomial_generator: 72.1751s [success]
> > >1.84%
> > >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> > >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> > >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> > >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> > >test_random.test_random: 42.8712s [success] 1.03%
> > >test_operator.test_layer_norm: 31.1242s
> >
> >
> > Python3 CPU Openblas:
> > > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s
> > > [success] 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > > test_random.test_negative_binomial_generator: 92.6899s [success]
> > > 3.78%
> > > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > > test_gluon.test_slice_pooling2d_slic

RE: MKLDNN performance in CI

2018-11-22 Thread Zhao, Patric

Good point, Tao! 
Is this env enabled in all MKL-DNN CI? 

> -Original Message-
> From: Lv, Tao A [mailto:tao.a...@intel.com]
> Sent: Friday, November 23, 2018 9:53 AM
> To: dev@mxnet.incubator.apache.org
> Subject: RE: MKLDNN performance in CI
> 
> Thanks for bringing this up, Marco. It's really weird since most of those 
> tests
> listed in "worth noting" are not related to mkldnn backend.
> 
> I can understand that some tests for mkldnn operator may be slower
> because MXNET_MKLDNN_DEBUG is enabled in the CI:
> https://github.com/apache/incubator-
> mxnet/blob/master/ci/docker/runtime_functions.sh#L713
> 
> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> Sent: Friday, November 23, 2018 9:22 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN performance in CI
> 
> Hello,
> 
> I have noticed that our Python tests have been increasing in duration 
> recently.
> In order to analyse this further, I created the PR [1] which allows to record
> test durations. Please note that I did not dive deep on these numbers and
> that they have to be taken with a grain of salt since slaves have varying
> resource utilizations.
> 
> Please have a look at the two following logs:
> Python3 CPU MKLDNN:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/155/steps/409/log/?start=0
> Python3 CPU Openblas:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/152/steps/398/log/?start=0
> 
> If you scroll to the end (note that there are multiple test stages and
> summaries being printed in these logs), you will find the following
> statements:
> 
> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> 
> This shows that the MKLDNN is generally being about 40% slower than the
> Openblas backend. If we go into the details, we can see that some tests are
> significantly slower:
> 
> Python3 CPU MKLDNN:
> 
> >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> >test_operator.test_pick: 74.4041s [success] 2.39%
> >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84%
> >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> >test_random.test_random: 42.8712s [success] 1.03%
> >test_operator.test_layer_norm: 31.1242s
> 
> 
> Python3 CPU Openblas:
> > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success]
> > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78%
> > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success]
> > 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
> > test_random.test_random: 55.8920s  [success] 2.19%
> > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
> > test_contrib_control_flow.test_cond: 20.6908s
> 
> Tests worth noting:
> - test_random.test_shuffle: 700% increase - but I don't know how this may
> be related to MKLDNN. Are we doing random number generation in either of
> those backends?
> - test_sparse_operator.test_elemwise_binary_ops: 700% increase
> - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to
> be expect :)
> - test_operator.test_broadcast_binary_op: 80% increase
> - test_contrib_control_flow.test_cond: 250% increase
> - test_operator.test_layer_norm: 50% decrease - nice!
> 
> As I have stated previously, these numbers might not mean anything since
> the CI is not a benchmarking environment (sorry if these are false negatives),
> but I thought it might be worth mentioning so Intel could follow up and dive
> deeper.
> 
> Does anybody here create 1:1 operator comparisons (e.g. running
> layer_norm in the different backends to compare the performance) who
> could provide us with those numbers?
> 
> Best regards,
> Marco
> 
> [1]: https://github.com/apache/incubator-mxnet/pull/13377

RE: MKLDNN performance in CI

2018-11-22 Thread Zhao, Patric

Happy Thanksgiving, everyone :)

Hi Marco,

Thanks to raising this question. We will look into the details for CI test 
cases and Shufan will provide the 1:1 OP level performance data.

In general, the CI tests is not performant cases which covered the different 
situations, even corner cases, for the quality purpose.
Specifically, some overhead may be introduced if we connect  an MKL-DNN OP and 
non MKL-DNN OP where the extra data conversion will happen.
So, it's the expected behaviors that the performance drops in these kinds of 
test cases in which the computation is quite tiny.

But in the real workload, these situations are not such frequency and the total 
performance of MKL-DNN will be much better than OpenBLAS.
(data in here: 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking)

For a short time, Shufan will help to look into these test cases and figure out 
a proper solution to make the CI faster.

For a medium and long time, we are working on implementing more MKL-DNN 
supported Ops, like reshape, slice, split, so that less data format conversion 
will be involved.

Feel free to let me know if you have any other concerns.

BR,

--Patric
 


> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com.INVALID]
> Sent: Friday, November 23, 2018 9:22 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN performance in CI
> 
> Hello,
> 
> I have noticed that our Python tests have been increasing in duration 
> recently.
> In order to analyse this further, I created the PR [1] which allows to record
> test durations. Please note that I did not dive deep on these numbers and
> that they have to be taken with a grain of salt since slaves have varying
> resource utilizations.
> 
> Please have a look at the two following logs:
> Python3 CPU MKLDNN:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/155/steps/409/log/?start=0
> Python3 CPU Openblas:
> http://jenkins.mxnet-ci.amazon-
> ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-
> validation/pipelines/unix-cpu/branches/PR-
> 13377/runs/2/nodes/152/steps/398/log/?start=0
> 
> If you scroll to the end (note that there are multiple test stages and
> summaries being printed in these logs), you will find the following
> statements:
> 
> Python3 CPU MKLDNN: "Ran 702 tests in 3042.102s"
> Python3 CPU Openblas: "Ran 702 tests in 2158.458s"
> 
> This shows that the MKLDNN is generally being about 40% slower than the
> Openblas backend. If we go into the details, we can see that some tests are
> significantly slower:
> 
> Python3 CPU MKLDNN:
> 
> >[success] 20.78% test_random.test_shuffle: 630.7165s [success] 17.79%
> >test_sparse_operator.test_elemwise_binary_ops: 540.0487s [success]
> >10.91% test_gluon_model_zoo.test_models: 331.1503s [success] 2.62%
> >test_operator.test_broadcast_binary_op: 79.4556s [success] 2.45%
> >test_operator.test_pick: 74.4041s [success] 2.39%
> >test_metric_perf.test_metric_performance: 72.5445s [success] 2.38%
> >test_random.test_negative_binomial_generator: 72.1751s [success] 1.84%
> >test_operator.test_psroipooling: 55.9432s [success] 1.78%
> >test_random.test_poisson_generator: 54.0104s [success] 1.72%
> >test_gluon.test_slice_pooling2d_slice_pooling2d: 52.3447s [success]
> >1.60% test_contrib_control_flow.test_cond: 48.6977s [success] 1.41%
> >test_random.test_random: 42.8712s [success] 1.03%
> >test_operator.test_layer_norm: 31.1242s
> 
> 
> Python3 CPU Openblas:
> > [success] 26.20% test_gluon_model_zoo.test_models: 563.3366s [success]
> > 4.34% test_random.test_shuffle: 93.3157s [success] 4.31%
> > test_random.test_negative_binomial_generator: 92.6899s [success] 3.78%
> > test_sparse_operator.test_elemwise_binary_ops: 81.2048s  [success]
> > 3.30% test_operator.test_psroipooling: 70.9090s  [success] 3.20%
> > test_random.test_poisson_generator: 68.7500s  [success] 3.10%
> > test_metric_perf.test_metric_performance: 66.6085s  [success] 2.79%
> > test_operator.test_layer_norm: 59.9566s  [success] 2.66%
> > test_gluon.test_slice_pooling2d_slice_pooling2d: 57.1887s  [success]
> > 2.62% test_operator.test_pick: 56.2312s  [success] 2.60%
> > test_random.test_random: 55.8920s  [success] 2.19%
> > test_operator.test_broadcast_binary_op: 47.1879s [success] 0.96%
> > test_contrib_control_flow.test_cond: 20.6908s
> 
> Tests worth noting:
> - test_random.test_shuffle: 700% increase - but I don't know how this may
> be related to MKLDNN. Are we doing random number generation in either of
> those backends?
> - test_sparse_operator.test_elemwise_binary_ops: 700% increase
> - test_gluon_model_zoo.test_models: 40% decrease - that's awesome and to
> be expect :)
> - test_operator.test_broadcast_binary_op: 80% increase
> - test_contrib_control_flow.test_cond: 250% increase
> - test_operator.test_layer_norm: 50% decrease - nice!
> 
> As I have stated

RE: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Zhao, Patric

Hi Kellen,

Thank you very much for your recognition for our works :) 

This is a great joint work from the community (Wu Jun, Zheng Da, etc.) and 
Intel team.

We are continuously improving the quantization flow now and more amazing 
features will be ready soon.

Thanks,

--Patric

> -Original Message-
> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> Sent: Thursday, November 22, 2018 9:07 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> I've spent the last few days testing MXNet w/ MKLDNN and quantized models
> and it's a beast.  Really good speed improvements on my models, no bugs
> that I've noticed.
> 
> I'm in general supportive but I'm still wondering what the story is like when
> there's no AVX instructions present on CPUs.  Do we get an illegal instruction
> error, or does it fallback gracefully?  So far it sounds like it works on a
> Threadripper and Xen AMD CPU.  I can try on a Ryzen.  What about older
> Intel or AMD CPUs?
> 
> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> 
> wrote:
> 
> > AMD benchmarks have been published. We are seeing a x15.8 speedup
> with
> > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> > smaller network (Mobilenet - batch size 32) the speedup is more
> > significant at x38.7. Let's have a vote to see if the PR to have
> > MKLDNN enabled by default
> > (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> > before 1.4.0 release.
> >
> > On 10/19/18, 9:17 AM, "Pedro Larroy" 
> > wrote:
> >
> > I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X
> > and unit
> > tests are passing.
> >
> > Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> > There's no "avx2" like on recent intel cpus.
> >
> > Pedro.
> >
> > On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> > wrote:
> >
> > > Awesome collaborative effort across many contributors and companies!
> > >
> > > The boost is impressive and for MXNet users to get this boost
> > "out of the
> > > box" is a great benefit and makes MXNet an even better choice.
> > >
> > > Alex - can you clarify whether there are any down sides with
> > regards to
> > > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully
> > fallback?
> > >
> > > Hagay
> > >
> > >
> > > On Fri, Oct 19, 2018, 15:46 Sergio Fernández 
> > wrote:
> > >
> > > > If there is no downside on platforms not supporting AVX512
> > instructions,
> > > > then +1
> > > >
> > > >
> > > > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> > > >
> > > > > Hey all,
> > > > > We have been working hard these past few months to integrate and
> > > > stabilize
> > > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and
> > have made
> > > > > incredible progress. On CPUs with AVX512 instructions (such
> > as
> > c5.18x)
> > > we
> > > > > have seen performance increase up to 12x and on other
> > platforms (Macs,
> > > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can
> > be found
> > > > here
> > > > > (
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=956507
> 64
> > > > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > > > >
> > > > > Currently, using this accelerator requires the developer to
> > either pip
> > > > > install the mxnet-mkl version of mxnet or to build it
> > themselves from
> > > > > source. Given that we should try to provide the best
> > performance "out
> > > of
> > > > > the box” with mxnet we should include this in the default build.
> > The
> > > > mkldnn
> > > > > library is included with in the pip package build so it does not
> > > require
> > > > an
> > > > > external dependency.
> > > > >
> > > > > There were concerns that MKLDNN could cause regressions on
> > certain
> > > > > platforms (as it did with the tensorflow version a while
> > back); but we
> > > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to
> > turn of
> > > this
> > > > > feature during runtime. Please bring up any other concerns
> > you may have
> > > > and
> > > > > your thoughts on including this accelerator in the default build.
> > > > >
> > > > > Best,
> > > > > Alex
> > > > >
> > > >
> > >
> >
> >
> >

RE: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release

2018-11-19 Thread Zhao, Patric

Thanks, Steffen. I think there is NO open issue to block the MKLDNN to GA now.

BTW, several quantization related PRs (#13297,#13260) are under the review and 
I think it can be merged in this week.

Thanks,

--Patric


> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Tuesday, November 20, 2018 2:57 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.4.0 release
> 
> On Friday the contributors working on Java API discovered a potential
> performance problem with inference using Java API vs. Python. Investigation
> is ongoing.
> As the Java API is one of the main features for the upcoming release, I
> suggest to post-pone the code freeze towards end of this week.
> 
> Please provide feedback and concern about the change in dates for code
> freeze and 1.4.0 release. I will provide updates on progress resolving the
> potential performance problem.
> 
> Patrick - do you think it is possible to resolve the remaining issues on MKL-
> DNN this week, so we can consider GA for MKL-DNN with 1.4.0?
> 
> Regards,
> Steffen
> 
> On Thu, Nov 15, 2018 at 5:26 AM Anton Chernov 
> wrote:
> 
> > I'd like to remind everyone that 'code freeze' would mean cutting a
> > v1.4.x release branch and all following fixes would need to be backported.
> > Development on master can be continued as usual.
> >
> > Best
> > Anton
> >
> > ср, 14 нояб. 2018 г. в 6:04, Steffen Rochel :
> >
> > > Dear MXNet community,
> > > the agreed plan was to establish code freeze for 1.4.0 release
> > > today. As the 1.3.1 patch release is still ongoing I suggest to
> > > post-pone the code freeze to Friday 16th November 2018.
> > >
> > > Sergey Kolychev has agreed to act as co-release manager for all
> > > tasks
> > which
> > > require committer privileges. If anybody is interested to volunteer
> > > as release manager - now is the time to speak up. Otherwise I will
> > > manage
> > the
> > > release.
> > >
> > > Regards,
> > > Steffen
> > >
> >

RE: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch release

2018-11-05 Thread Zhao, Patric

Hi Anton,

Thanks for looking into the MKL-DNN PR.

As my understanding of cwiki 
(https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+for+next+MXNet+Release),
these features will go into 1.4 rather than patch release of 1.3.1.

Feel free to correct me :)

Thanks,

--Patric

> -Original Message-
> From: Anton Chernov [mailto:mecher...@gmail.com]
> Sent: Tuesday, November 6, 2018 3:11 AM
> To: d...@mxnet.apache.org
> Subject: Re: [Announce] Upcoming Apache MXNet (incubating) 1.3.1 patch
> release
> 
> It seems that there is a problem porting following changes to the v1.3.x
> release branch:
> 
> Implement mkldnn convolution fusion and quantization
> https://github.com/apache/incubator-mxnet/pull/12530
> 
> MKL-DNN Quantization Examples and README
> https://github.com/apache/incubator-mxnet/pull/12808
> 
> The bases are different.
> 
> I would need help from authors of these changes to make a backport PR.
> 
> @ZhennanQin, @xinyu-intel would you be able to assist me and create the
> corresponding PR's?
> 
> Without proper history and domain knowledge I would not be able to create
> them by my own in reasonable amount of time, I'm afraid.
> 
> Best regards,
> Anton
> 
> пн, 5 нояб. 2018 г. в 19:45, Anton Chernov :
> 
> >
> > As part of:
> >
> > Implement mkldnn convolution fusion and quantization
> > https://github.com/apache/incubator-mxnet/pull/12530
> >
> > I propose to add the examples and documentation PR as well:
> >
> > MKL-DNN Quantization Examples and README
> > https://github.com/apache/incubator-mxnet/pull/12808
> >
> >
> > Best regards,
> > Anton
> >
> > пн, 5 нояб. 2018 г. в 19:02, Anton Chernov :
> >
> >> Dear MXNet community,
> >>
> >> I will be the release manager for the upcoming 1.3.1 patch release.
> >> Naveen will be co-managing the release and providing help from the
> >> committers side.
> >>
> >> The following dates have been set:
> >>
> >> Code Freeze: 31st October 2018
> >> Release published: 13th November 2018
> >>
> >> Release notes have been drafted here [1].
> >>
> >>
> >> * Known issues
> >>
> >> Update MKL-DNN dependency
> >> https://github.com/apache/incubator-mxnet/pull/12953
> >>
> >> This PR hasn't been merged even to master yet. Requires additional
> >> discussion and merge.
> >>
> >> distributed kvstore bug in MXNet
> >> https://github.com/apache/incubator-mxnet/issues/12713
> >>
> >> > When distributed kvstore is used, by default gluon.Trainer doesn't
> >> > work
> >> with mx.optimizer.LRScheduler if a worker has more than 1 GPU. To be
> >> more specific, the trainer updates once per GPU, the LRScheduler
> >> object is shared across GPUs and get a wrong update count.
> >>
> >> This needs to be fixed. [6]
> >>
> >>
> >> * Changes
> >>
> >> The following changes will be ported to the release branch, per [2]:
> >>
> >> Infer dtype in SymbolBlock import from input symbol [3]
> >> https://github.com/apache/incubator-mxnet/pull/12412
> >>
> >> [MXNET-953] Fix oob memory read
> >> https://github.com/apache/incubator-mxnet/pull/12631
> >>
> >> [MXNET-969] Fix buffer overflow in RNNOp
> >> https://github.com/apache/incubator-mxnet/pull/12603
> >>
> >> [MXNET-922] Fix memleak in profiler
> >> https://github.com/apache/incubator-mxnet/pull/12499
> >>
> >> Implement mkldnn convolution fusion and quantization (MXNet Graph
> >> Optimization and Quantization based on subgraph and MKL-DNN
> proposal
> >> [4])
> >> https://github.com/apache/incubator-mxnet/pull/12530
> >>
> >> Following items (test cases) should be already part of 1.3.0:
> >>
> >> [MXNET-486] Create CPP test for concat MKLDNN operator
> >> https://github.com/apache/incubator-mxnet/pull/11371
> >>
> >> [MXNET-489] MKLDNN Pool test
> >> https://github.com/apache/incubator-mxnet/pull/11608
> >>
> >> [MXNET-484] MKLDNN C++ test for LRN operator
> >> https://github.com/apache/incubator-mxnet/pull/11831
> >>
> >> [MXNET-546] Add unit test for MKLDNNSum
> >> https://github.com/apache/incubator-mxnet/pull/11272
> >>
> >> [MXNET-498] Test MKLDNN backward operators
> >> https://github.com/apache/incubator-mxnet/pull/11232
> >>
> >> [MXNET-500] Test cases improvement for MKLDNN on Gluon
> >> https://github.com/apache/incubator-mxnet/pull/10921
> >>
> >> Set correct update on kvstore flag in dist_device_sync mode (as part
> >> of fixing [5])
> >> https://github.com/apache/incubator-mxnet/pull/12786
> >>
> >> upgrade mshadow version
> >> https://github.com/apache/incubator-mxnet/pull/12692
> >> But another PR will be used instead:
> >> update mshadow
> >> https://github.com/apache/incubator-mxnet/pull/12674
> >>
> >> CudnnFind() usage improvements
> >> https://github.com/apache/incubator-mxnet/pull/12804
> >> A critical CUDNN fix that reduces GPU memory consumption and
> >> addresses this memory leak issue. This is an important fix to include
> >> in 1.3.1
> >>
> >>
> >> From discussion about gluon toolkits:
> >>
> >> disable opencv threading for forked process
> >>

RE: Include MKLDNN into default mxnet pip package

2018-10-18 Thread Zhao, Patric

Thanks Alex for bringing up this proposal. As far as I know, applied to the 
MKL-DNN backend, MXNet is the most performant framework on CPU side now. 
Especially that the recent subgraph fusion feature boosts the performance a lot 
again. 
Thus, I think it’s worth to make it default and let more users leverage the 
benefits of it.

Regarding MKL-DNN integration, it’s a joint work and takes lots of effort from 
Amazon and Intel engineers, including Da, Jun, Haibin, Junyuan, Sheng, Marco, 
Chris (AWS) and Patric, Tao, Wenting, Rong , Jin, Shufan, Ashok (Intel).
We also got many great suggestions from MXNet community and learned much from 
those discussions. Here I personally want to appreciate Da Zheng for his great 
efforts in this project. 
As the main contributor, he plays an important role in the project, from the 
initial co-design, implementations to recent advanced subgraph feature and 
finally makes these good things happen.

I would like to thank Alex for stabilizing MKL-DNN backend by adding more tests 
for it and also environment variables so the user can switch between the 
original flow and MKL-DNN flow easily. 
His efforts are really helpful for pushing MKL-DNN backend from experimental 
toward GA.

MXNet community is one of the best groups and there're many intelligent people 
here. 

Thank you all for the strong support.

--Patric 

> -Original Message-
> From: Jun Wu [mailto:wujun@gmail.com]
> Sent: Thursday, October 18, 2018 6:29 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org; aza...@gmail.com
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> If my understanding is correct about the context, it should be acknowledged
> that the significant performance improvement comes from the Intel
> MKLDNN team's contribution in this PR:
> https://github.com/apache/incubator-mxnet/pull/12530.
> 
> On Wed, Oct 17, 2018 at 3:12 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> 
> > First of all thanks to Intel for these improvements, really a great effort.
> >
> > What would the compatibility story look like for users that don't have
> > these AVX instructions?  Would there be any negative affect for AMD users?
> >
> > Regarding TensorRT: It's a possibility but not planned in the short
> > term. A few considerations would be the limits on PyPi package sizes
> > and the bloat incurred with TRT, the requirements of TRT to be
> > installed on the user side, and the TRT engine build times which are
> > non-trivial.  We can work towards fixing or working around these
> > issues in the future if default TRT is something the user community
> > would like to see for Cuda packages.  While the feature is
> > experimental we'll likely continue to use 'mxnet-tensorrt-cu92' and
> 'mxnet-tensorrt-cu90'.
> >
> > On Wed, Oct 17, 2018 at 2:12 PM Alfredo Luque
> >  wrote:
> >
> > > This is huge. Thanks for working on this. Is there a similar plan
> > > with
> > eg;
> > > tensor-rt support being ported into the main cuda-9.x packages?
> > >
> > > On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:
> > >
> > > Hey all,
> > > We have been working hard these past few months to integrate and
> > stabilize
> > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have
> > > made incredible progress. On CPUs with AVX512 instructions (such as
> > > c5.18x) we have seen performance increase up to 12x and on other
> > > platforms (Macs,
> > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be
> > > found
> > here
> > > (
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650
> > 764
> > > and https://github.com/apache/incubator-mxnet/pull/12591).
> > >
> > > Currently, using this accelerator requires the developer to either
> > > pip install the mxnet-mkl version of mxnet or to build it themselves
> > > from source. Given that we should try to provide the best
> > > performance "out of the box” with mxnet we should include this in
> > > the default build. The
> > mkldnn
> > > library is included with in the pip package build so it does not
> > > require
> > an
> > > external dependency.
> > >
> > > There were concerns that MKLDNN could cause regressions on certain
> > > platforms (as it did with the tensorflow version a while back); but
> > > we added a env flag (MXNET_MKLDNN_ENABLED) that allows users to
> turn
> > > of this feature during runtime. Please bring up any other concerns
> > > you may have
> > and
> > > your thoughts on including this accelerator in the default build.
> > >
> > > Best,
> > > Alex
> > >
> > > —
> > > Alfredo Luque
> > > Software Engineer
> > > Machine Learning Infrastructure
> > > Airbnb
> > > San Francisco, CA
> > >
> >

RE: Proposal for subgraph based the graph optimization and quantization

2018-10-03 Thread Zhao, Patric

Sent again and our PR are submitted (in 22 days ago).

Please help take a review in case you're interested in :)
https://github.com/apache/incubator-mxnet/pull/12530

Thanks the great suggestions from Jun, Da, Haibin and other committers.

BR,

--Patric


From: Zhao, Patric
Sent: Wednesday, August 15, 2018 10:51 AM
To: dev@mxnet.incubator.apache.org
Cc: Zheng, Da ; Jun Wu ; Ye, Jason Y 

Subject: Proposal for subgraph based the graph optimization and quantization

Hi MXNet owners and committers,

A new proposal is posted in the wiki for the graph optimization and 
quantization approach.
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN

Really thanks for the supports from Zheng Da and Wu Jun.

Any feedbacks are highly appreciated :)

BR,

--Patric

RE: [Discuss] Next MXNet release

2018-10-01 Thread Zhao, Patric

Thanks to let us know this discussion. 
Because we don't have enough bandwidth to track the different sources, like 
discussion forum.

I think the best way is to open issue in the github so that we can answer/solve 
the issue in time :)

Thanks,

--Patric

> -Original Message-
> From: Afrooze, Sina [mailto:sina@gmail.com]
> Sent: Tuesday, October 2, 2018 1:14 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Ye, Jason Y ; Zai, Alexander
> ; Zheng, Da 
> Subject: Re: [Discuss] Next MXNet release
> 
> This post suggests there is a regression from 1.1.0 to 1.2.1 related to
> MKLDNN integration: https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-
> outputs/1882
> 
> The error is related to MKLDNN layout not being converted back to MXNet
> layout in some operator: " !IsMKLDNNData() We can’t generate TBlob for
> MKLDNN data. Please use Reorder2Default() to generate a new NDArray
> first"
> 
> Sina
> 
> 
> 
> 
> On 9/30/18, 6:55 PM, "Steffen Rochel"  wrote:
> 
> Thanks Patrick.
> Updated roadmap and next release content.
> 
> Patrick - suggest to send a reminder to review the design doc and collect
> feedback.
> Are there still known issues or gaps before we declare MKL-DNN
> integration
> as GA?
> 
> Regards,
> Steffen
> 
> On Sat, Sep 29, 2018 at 1:31 AM Zhao, Patric 
> wrote:
> 
> > Thanks, Steffen.
> >
> > Regarding the next release note, two items from our side:
> >
> > 1. (-remove) MKL-DNN integration is done. I think we can remove this
> item.
> > 2. (+add) MKL-DNN based graph optimization and quantization by
> subgraph
> > Design doc:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > Lead Contributor: Patric Zhao,  https://github.com/pengzhao-intel/
> >
> > Regarding the Roadmap
> > (+add) Q1 2019: MKL-DNN RNN API supports
> >
> > BR,
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > Sent: Saturday, September 29, 2018 11:31 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [Discuss] Next MXNet release
> > >
> > > Sorry I meant to say next 'Regarding the *minor* release'.
> > >
> > > On Sat, Sep 29, 2018 at 5:27 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Thanks for transparently setting a rough timeline Steffen.  I think
> > > > this will go a long way in helping the community plan their work, 
> even
> > > > if the details change somewhat on the road to the release.
> > > >
> > > > Regarding the major release: I would propose we unify TensorRT with
> > > > the subgraph operator work.
> > > >
> > > > Regarding the patch release:  There were a few minor stack/buffer
> > > > overflows exposed by ASAN that have been addressed.  It's probably
> a
> > > > good idea to include them in a patch release, as they at best result
> > > > in non-deterministic behaviour.
> > > >
> > > > -Kellen
> > > >
> > > >
> > > > On Sat, Sep 29, 2018 at 1:39 AM Steffen Rochel
> > > > 
> > > > wrote:
> > > >
> > > >> I updated
> > > >>
> > > >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
> > > >> or+next+MXNet+Release
> > > >> ,
> > > >> removed the completed items from 1.3 release and would like to
> kick
> > > >> off discussion about the next release. Please suggest what you
> would
> > > >> like to see included in the next release together with link to 
> design
> > > >> proposal (appropriately for the size and complexity of the 
> proposal)
> > > >> or suggest changes.
> > > >> I suggest to target the next release for December 2018 to frame the
> > > >> discussion.
> > > >> Lets include review of
> > > >>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Roadmap -
> > > >> time to update and discuss changes.
> > > >>
> > > >> From the 1.3 release we had discussion regarding
> > > >> https://github.com/apache/incubator-mxnet/issues/11849 and
> resolution
> > > >> in
> > > >> https://github.com/apache/incubator-mxnet/pull/12412 .
> > > >> Are you aware of critical issues and feedback from user which we
> > > >> should consider for a potential 1.3.1 patch release. Should we
> > > >> include PR 12412 in a potential patch release?
> > > >>
> > > >> Regards,
> > > >> Steffen
> > > >>
> > > >
> >
> 
>

RE: [Discuss] Next MXNet release

2018-10-01 Thread Zhao, Patric

Thanks, Steffen. 

I will send the reminder again and currently Da, Jun, Haibin and Marco is 
reviewing our 1st PR (12530).

Regarding MKL-DNN integration, the MKL-DNN backend reached GA now from my view.
In the last development cycle, lots of tests, both unit tests and real models, 
are added to improve the quality.
And we don't see any big defects in the current solution. 

Really thanks the efforts form Alex and Shufan adding a branch of test case.
1) unit test
Such as PR, concat 11371, pool 11608, LRN 11831, Sum 11272, backward 11232, 
gluon 10921. 
The new CPP test located in 
https://github.com/apache/incubator-mxnet/blob/master/tests/cpp/operator/mkldnn.cc
 and 
the gluon test in 
https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_gluon.py.

2) model level
The model level coverage, including CV and non-CV models, are tracked in our 
local servers weekly with official master branch.
The CV tests includes RESNET50, inception bn, SSD, etc; non-CV tests includes 
sockeye/GNMT, lstm_bucketing models, etc.
All models we tracked can converged with the expected accuracy and performance. 
 

BTW, is there a check list for grading? If so, it's easy to evaluate 
objectively :)

Thanks,

--Patric



> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Monday, October 1, 2018 9:54 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Ye, Jason Y ; Zai, Alexander
> ; Zheng, Da 
> Subject: Re: [Discuss] Next MXNet release
> 
> Thanks Patrick.
> Updated roadmap and next release content.
> 
> Patrick - suggest to send a reminder to review the design doc and collect
> feedback.
> Are there still known issues or gaps before we declare MKL-DNN integration
> as GA?
> 
> Regards,
> Steffen
> 
> On Sat, Sep 29, 2018 at 1:31 AM Zhao, Patric  wrote:
> 
> > Thanks, Steffen.
> >
> > Regarding the next release note, two items from our side:
> >
> > 1. (-remove) MKL-DNN integration is done. I think we can remove this item.
> > 2. (+add) MKL-DNN based graph optimization and quantization by
> subgraph
> > Design doc:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimiz
> ation+and+Quantization+based+on+subgraph+and+MKL-DNN
> > Lead Contributor: Patric Zhao,  https://github.com/pengzhao-intel/
> >
> > Regarding the Roadmap
> > (+add) Q1 2019: MKL-DNN RNN API supports
> >
> > BR,
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > Sent: Saturday, September 29, 2018 11:31 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: [Discuss] Next MXNet release
> > >
> > > Sorry I meant to say next 'Regarding the *minor* release'.
> > >
> > > On Sat, Sep 29, 2018 at 5:27 AM kellen sunderland <
> > > kellen.sunderl...@gmail.com> wrote:
> > >
> > > > Thanks for transparently setting a rough timeline Steffen.  I
> > > > think this will go a long way in helping the community plan their
> > > > work, even if the details change somewhat on the road to the release.
> > > >
> > > > Regarding the major release: I would propose we unify TensorRT
> > > > with the subgraph operator work.
> > > >
> > > > Regarding the patch release:  There were a few minor stack/buffer
> > > > overflows exposed by ASAN that have been addressed.  It's probably
> > > > a good idea to include them in a patch release, as they at best
> > > > result in non-deterministic behaviour.
> > > >
> > > > -Kellen
> > > >
> > > >
> > > > On Sat, Sep 29, 2018 at 1:39 AM Steffen Rochel
> > > > 
> > > > wrote:
> > > >
> > > >> I updated
> > > >>
> > > >> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposa
> > > >> ls+f
> > > >> or+next+MXNet+Release
> > > >> ,
> > > >> removed the completed items from 1.3 release and would like to
> > > >> kick off discussion about the next release. Please suggest what
> > > >> you would like to see included in the next release together with
> > > >> link to design proposal (appropriately for the size and
> > > >> complexity of the proposal) or suggest changes.
> > > >> I suggest to target the next release for December 2018 to frame
> > > >> the discussion.
> > > >> Lets include review of
> > > >>
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Roadmap -
> > > >> time to update and discuss changes.
> > > >>
> > > >> From the 1.3 release we had discussion regarding
> > > >> https://github.com/apache/incubator-mxnet/issues/11849 and
> > > >> resolution in
> > > >> https://github.com/apache/incubator-mxnet/pull/12412 .
> > > >> Are you aware of critical issues and feedback from user which we
> > > >> should consider for a potential 1.3.1 patch release. Should we
> > > >> include PR 12412 in a potential patch release?
> > > >>
> > > >> Regards,
> > > >> Steffen
> > > >>
> > > >
> >

RE: Mentor changes

2018-09-27 Thread Zhao, Patric

Welcome, Jason, I think MXNet will achieve the great success as same as BigDL.

Looking forward to working with you :)


> -Original Message-
> From: Hen [mailto:bay...@apache.org]
> Sent: Friday, September 28, 2018 8:23 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Jim Jagielski ; Michael Wall ; Bob
> Paulin ; Jason Dai 
> Subject: Mentor changes
> 
> I'd like to welcome four additional mentors (cc'd) for MXNet :)
> 
>  * Jason Dai;
>  * Jim Jagielski;
>  * Bob Paulin; and
>  * Michael Wall.
> 
> Suneel Marthi has also stepped back from mentoring.
> 
> Thank you to each of our new mentors for joining in, and many thanks to
> Suneel for the time he's given over the last 2 years.
> 
> Hen

RE: Release plan - MXNET 1.3

2018-08-21 Thread Zhao, Patric

Hi Roshani,

Good notes :) 

Several items about the performance and MKL-DNN in the below, please help take 
a review.

@Da, Alex, if anything about MKL-DNN is missed, feel free to add.

*Performance improvement
+Support for dot(dns, csr) = dns and dot(dns, csr.T) = dns on CPU
https://github.com/apache/incubator-mxnet/pull/3
+Performance improvement for Batch Dot on CPU from mshadow
https://github.com/dmlc/mshadow/pull/342
-Fix the topk regression issue (#12197)
This is the bugfix rather than performance improvements


*MKL-DNN
More functionality supports:
+Support more activation functions, "sigmoid", "tanh", "softrelu" 
https://github.com/apache/incubator-mxnet/pull/10336

Debugging functionality:
+Result check 
https://github.com/apache/incubator-mxnet/pull/12069
+Backend switch
https://github.com/apache/incubator-mxnet/pull/12058

Thanks,

--Patric

> -Original Message-
> From: Roshani Nagmote [mailto:roshaninagmo...@gmail.com]
> Sent: Wednesday, August 22, 2018 1:53 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Release plan - MXNET 1.3
> 
> Hi,
> 
> Thank you everyone for helping to clear release blockers. CI tests were 
> failing
> so we delayed RC by some time. But now the tests are passing and we are
> ready to cut the release branch.
> 
> I have drafted release notes here:
> https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+%28in
> cubating%29+1.3.0+Release+Notes
> 
> 
> Please take a look and update if I have missed anything. I will be cutting
> RC0 tomorrow.
> 
> Thanks,
> Roshani
> 
> On Thu, Aug 16, 2018 at 2:28 PM Roshani Nagmote
> 
> wrote:
> 
> > Sure will do. thanks.
> >
> > -Roshani
> >
> > On Thu, Aug 16, 2018 at 11:53 AM Afrooze, Sina 
> wrote:
> >
> >> Hi Roshani - Can you please make sure that this fix (which is already
> >> merged to master) is also merged to the stable branch for 1.3.0:
> >> https://github.com/apache/incubator-mxnet/pull/11493 - Thanks, Sina
> >>
> >>
> >> On 8/16/18, 10:51 AM, "Roshani Nagmote"
> 
> >> wrote:
> >>
> >> Hi all,
> >>
> >> Release status:
> >>
> >> Currently, for release 1.3.0 there are a couple of issues open
> >> which needs
> >> to be resolved before cutting RC.
> >>
> >> The current date we are looking at for cutting RC0 is 08/17(Friday).
> >>
> >>
> >>
> >> Open issues which need to be looked at before cutting RC:
> >>
> >>1. Topk regression issue
> >> -
> >> #12202 PR
> >>with fix 
> >>2. Excessive memory allocation issue
> >> -
> >> #12184 PR
> >>with fix 
> >>3. Test_io.test_csvIter breaks on CentOS
> >> -
> >> #12189 PR
> >>with fix
> >> 
> >>
> >>
> >>
> >> @committers, could you please help review these PRs and get them
> >> merged?
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Roshani
> >>
> >> On Tue, Aug 14, 2018 at 12:46 PM Roshani Nagmote <
> >> roshaninagmo...@gmail.com>
> >> wrote:
> >>
> >> > Talked to the person who ran resnet50 benchmarks offline. Build
> >> flag was
> >> > not properly set so there was a difference in performance
> >> numbers observed.
> >> > There is no issue caught and he was able to get the same results as
> >> > mentioned here https://mxnet.incubator.apache.org/faq/perf.html
> >> > 
> >> >
> >> > We are good here.
> >> >
> >> > Thanks,
> >> > Roshani
> >> >
> >> > On Mon, Aug 13, 2018 at 4:08 PM Roshani Nagmote <
> >> roshaninagmo...@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi Dom,
> >> >>
> >> >> I verified resnet50 run on MXNet master branch. Checked on
> >> single gpu
> >> >> machine. Numbers match. I didn't see any performance degradation.
> >> >> https://mxnet.incubator.apache.org/faq/perf.html#scoring-results
> >> >>
> >> >> Can you please give me more details on the instance type and
> >> script you
> >> >> ran exactly so that I can try to reproduce it again?
> >> >>
> >> >> Thanks,
> >> >> Roshani
> >> >>
> >> >>
> >> >> On Mon, Aug 13, 2018 at 12:31 PM Roshani Nagmote <
> >> >> roshaninagmo...@gmail.com> wrote:
> >> >>
> >> >>> This is not a major feature. I meant other new feature
> >> requests PR won't
> >> >>> be accepted in 1.3 release now.
> >> >>> Bug fixes will be accepted. I will be trying to reproduce the
> >> regression
> >> >>> Dom mentioned today. :)
> >> >>>
> >> >>> Thanks,
> >> >>> Roshani
> >> >>>
> >> >>> On Mon, Aug 13, 2018 at 12:06 PM Naveen Swamy
> >>  >> >
> >> >>> wrote:
> >> >>>
> >>

RE: Release blocker? - buggy topk Op

2018-08-16 Thread Zhao, Patric

Hi Leonard,

Thanks to raising the issue of topk op.

The root cause is from the current API design which used float data type to 
represent the integer index, and as we know, the float type could NOT express 
the large integer precisely.
(I have no offense. I know I missed some backgrounds and I think the current 
design is very good).

The new CI#12085 changes the computation order and make this issue looks more 
significant. Essentially, the bug will happen when the index is large whatever 
with or without the new CI. 
One line example code can trigger the issue, 
'print(mx.nd.topk(mx.nd.array(np.arange(256*300096).reshape(8, -1)), k=4))'.

Thus, the real fix is to change the API interface and use INT for the index. 
But it might introduce compatibility issue to current framework/topology due to 
API change.
I am not sure we need to change in the last minutes of release 1.3 (actually, 
we can contribute to it).

Currently, we submitted a fix (#12202) to make the computation order as same as 
before and still much faster :)

Apologies for the confusion and feel free to let us know for any feedback.

Thanks,

--Patric


> -Original Message-
> From: Leonard Lausen [mailto:l-softw...@lausen.nl]
> Sent: Thursday, August 16, 2018 9:51 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Release blocker? - buggy topk Op
> 
> Recent changes in mxnet master introduced a bug into the topk operator.
>  Below code example will output [ 274232. 179574. 274233. 274231.] with
>  mxnet-cu90==1.3.0b20180810 but [ 274232. 179574. 274232. 274232.] with
> mxnet-cu90==1.3.0b20180814. Likely #12085 is at fault.
> 
> See https://github.com/apache/incubator-mxnet/issues/12197 for more info.
> 
> I think this should be considered a release blocker for the 1.3 release.
> 
> Note this breaks some parts of the KDD 18 MXNet / Gluon tutorial which is
> scheduled for next Tuesday http://www.kdd.org/kdd2018/hands-on-
> tutorials/view/mxnet-with-a-focus-on-nlp
> . (We can work around by asking people to install the 0810 version
> though.)

Proposal for subgraph based the graph optimization and quantization

2018-08-14 Thread Zhao, Patric

Hi MXNet owners and committers,

A new proposal is posted in the wiki for the graph optimization and 
quantization approach.
https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN

Really thanks for the supports from Zheng Da and Wu Jun.

Any feedbacks are highly appreciated :)

BR,

--Patric

Suggestions for Design Proposal Template

2018-08-08 Thread Zhao, Patric

Hi MXNet owner,

We (Intel engineers) have already wrote up several design proposals and 
published into cwiki.
So, I really like this documents and it make things very clear.
https://cwiki.apache.org/confluence/display/MXNET/Apache+MXNet+Design+Proposal+Template

Furthermore, I suggest adding a section of "feedbacks from MXNet owner 
(committers)".
It's better to assign the proposal to the committers and write up the 
committer's name in the doc.
The committer owner should give a clear suggestion/decision about the proposal 
in a time slot (maybe two weeks).

I know this will take the extra-efforts to the committer and owners.
But it can make the whole project more efficient and we will have a clear goal.


Thanks,

--Patric

RE: [DISCUSSION] Initial draft for MXNet roadmap

2018-07-05 Thread Zhao, Patric

Hi Steffen,

It's really great to share the MXNet roadmap in the community.
So, we have a clear picture and can align with the strategies.

Regarding the Q3 plan, High quality support for MKL (incl. MKL-DNN),  it's 
highly match with our plan. 
We will move the current solution into sub-graph and make the backend more 
robust.

Meanwhile, another task in our roadmap is  the quantization flow for INT8 with 
MKL-DNN.
The  current experimental flow is still marked as experimental and doesn't 
apply the sub-graph.
So, we will focus on making the quantization flow to GA by the sub-graph in Q3.

Would you mind add the quantization parts into the MXNet Q3 roadmap?

Thanks,

--Patric




> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Thursday, July 5, 2018 2:05 AM
> To: dev@mxnet.incubator.apache.org
> Subject: [DISCUSSION] Initial draft for MXNet roadmap
> 
> As a project contributor, I published an initial draft for MXNet roadmap at
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Roadmap
> The initial draft is based on offline discussion with various contributors and
> committers including Mu, Junyuan and AWS developer community.
> 
> Please review and suggest changes and enhancements.
> Please also review https://spark.apache.org/improvement-proposals.html and
> share your thoughts if the project should adopt a similar process or suggest
> something you think is more appropriate.
> 
> Regards,
> Steffen

RE: MKLDNN Integration Stable Release

2018-07-03 Thread Zhao, Patric

Thanks, Sandeep,

> * If all existing RNN integration tests pass with MKL-DNN build, this should 
> give
> enough confidence?

This should be the baseline of the merge. We still need to confirm the 
performance gain from new API.

> * Also, I remember one of the community member saying "mxnet-mkl" pypi
> package is not compiled with MKLDNN. Not sure about this, but, can we please
> confirm?

After 1.2.0, "*-mkl" is compiled with MKLDNN. But "*-mkl" is not compiled with 
MKL library (still using OpenBLAS from pypi).
We are working on this item and try to make both MKL-DNN and MKL library are 
compiled under "*-mkl".
 

> -Original Message-
> From: sandeep krishnamurthy [mailto:sandeep.krishn...@gmail.com]
> Sent: Wednesday, July 4, 2018 11:27 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: MKLDNN Integration Stable Release
> 
> * If all existing RNN integration tests pass with MKL-DNN build, this should 
> give
> enough confidence?
> * Also, I remember one of the community member saying "mxnet-mkl" pypi
> package is not compiled with MKLDNN. Not sure about this, but, can we please
> confirm?
> 
> Best,
> Sandeep
> 
> On Tue, Jul 3, 2018 at 7:37 PM Zhao, Patric  wrote:
> 
> > Hi Alex,
> >
> > Regarding RNN, the first version of MKL-DNN RNN API is available in
> > the MKL-DNN master branch.
> > We have integrated it in our local branch and you can try our code
> > (still in developments).
> >
> >
> > https://github.com/lihaofd/incubator-mxnet/blob/mkldnn-rnn/src/operato
> > r/nn/mkldnn/mkldnn_rnn_impl.h
> >
> > We plan to PR our integration into MXNET master when both
> > functionality and performance are qualified.
> >
> > Thanks,
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Alex Zai [mailto:aza...@gmail.com]
> > > Sent: Wednesday, July 4, 2018 1:17 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: MKLDNN Integration Stable Release
> > >
> > > We are preparing a stable release of MKL-DNN integration in 1.3.0
> > > (experimental since 1.2.0), which supports acceleration of
> > > operations
> > such as
> > > Convolution, Deconvolution, FullyConnected, Pooling, Batch
> > > Normalization, Activation, LRN, Softmax. Currently the RNN operator
> > > is not supported as
> > the
> > > MKL-DNN API is still experimental; however, they hope to release a
> > > more stable version RNN API this or next week in MKL-DNN 0.15.
> > >
> > > We will have CPP unit test support on these operators and I am
> > > planning
> > to
> > > write python unit tests to compare a RNN network's results from the
> > MKLDNN
> > > backend with that of the GPU to test accuracy. Is there any
> > > additional
> > coverage
> > > that you think we should cover in the next two weeks?
> > >
> > > Alex
> >
> 
> 
> --
> Sandeep Krishnamurthy

RE: MKLDNN Integration Stable Release

2018-07-03 Thread Zhao, Patric

Hi Alex,

Regarding RNN, the first version of MKL-DNN RNN API is available in the MKL-DNN 
master branch.
We have integrated it in our local branch and you can try our code (still in 
developments).

https://github.com/lihaofd/incubator-mxnet/blob/mkldnn-rnn/src/operator/nn/mkldnn/mkldnn_rnn_impl.h

We plan to PR our integration into MXNET master when both functionality and 
performance are qualified.

Thanks,

--Patric  

> -Original Message-
> From: Alex Zai [mailto:aza...@gmail.com]
> Sent: Wednesday, July 4, 2018 1:17 AM
> To: dev@mxnet.incubator.apache.org
> Subject: MKLDNN Integration Stable Release
> 
> We are preparing a stable release of MKL-DNN integration in 1.3.0
> (experimental since 1.2.0), which supports acceleration of operations such as
> Convolution, Deconvolution, FullyConnected, Pooling, Batch Normalization,
> Activation, LRN, Softmax. Currently the RNN operator is not supported as the
> MKL-DNN API is still experimental; however, they hope to release a more
> stable version RNN API this or next week in MKL-DNN 0.15.
> 
> We will have CPP unit test support on these operators and I am planning to
> write python unit tests to compare a RNN network's results from the MKLDNN
> backend with that of the GPU to test accuracy. Is there any additional 
> coverage
> that you think we should cover in the next two weeks?
> 
> Alex

RE: Project Proposal for fused CPU RNN OPs to the release 1.3

2018-06-22 Thread Zhao, Patric

Hello Steffen,

Really thanks to look into our proposal. I totally understand your concern that 
the quality is the most important thing.
We will pay much attention on it.

Regarding RNN Ops, the new OP provides about 2-3X performance boost (the 
performance section of proposal).
Most importantly, it makes the gluon RNN/NLP model can be hybridized to 
symbolic models in both CPU and GPU
(previously it only works on GPU w/o the fused CPU Ops).
For the correctness, the unit tests was added w/ the PR and we also tested the 
accuracy with the real cases. 
I will update the doc with more information about the correctness and our 
experiment results.

Regarding MKL-DNN integration,  the issues (bugs and some corner cases) has 
gradually emerged along with more users switching to it. 
Absolutely, it is really important to make it stable and we really care about 
this.
Zheng Da, Alex and our team are working on the known issues and have already 
fixed lots of them. 
Furthermore, a bunch of unit test, including gluon, symbolic, CPP cases, are 
added to cover more situations.
Obviously, the MKL-DNN backend trends to more complete and robust. We hope to 
upgrade it to GA in the 1.3.

Finally, I think we need some patience for the new features and incubate it to 
the mature.

Thanks for your suggestions again.

--Patric

> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Friday, June 22, 2018 10:45 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Lv, Tao A ; Li, Hao H ; Ye,
> Jason Y ; Emani, Ashok 
> Subject: Re: Project Proposal for fused CPU RNN OPs to the release 1.3
> 
> Thanks Patric, appreciate  your contributions. I looked at your design
> proposal. I'm missing any statements about validation of correctness and
> performance of the integrated solution. I would suggest to pay more
> attention to this aspect as we struggled in previously releases with the
> quality of the integration. As you know, we still have too many issues on
> MKL-DNN integration to move from experimental to GA stage.
> Regards,
> Steffen
> 
> On Thu, Jun 21, 2018 at 12:09 AM Zhao, Patric 
> wrote:
> 
> > Hi MXNET owner,
> >
> > Recently, we (Intel engineers) have implemented the fused RNN
> > operations
> > (LSTM/GRU/vRNN) for the CPU, including bidirectional, multiple layers,
> > inference/training.
> > The LSTM and GRU PR was merged and vRNN code will be PR soon.
> >
> > The new APIs make the gluon and symbolic models much faster :)
> >
> > Thus, I have added a new row in the 1.3 proposal table and hope the
> > end user can leverage the new feature easily.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+fo
> > r+next+MXNet+Release
> >
> > Feel free to let me know for any feedbacks and suggestions.
> >
> > BR,
> >
> > Thanks,
> >
> > --Patric
> >
> >

RE: summary of 5/29 v1.2 post mortem & next release brainstorming call

2018-05-30 Thread Zhao, Patric

Hi  Steffen,

It's a good meetup and it's pity I missed to join.

Regarding the proposal of 1.3, we (Intel team) want to add two items into the 
list. 
Please help take a review:

1) fused RNN operators for CPU (GRU/LSTM/vRNN)
Lead contributor : Patric Zhao (https://github.com/pengzhao-intel/), Lv Tao 
(https://github.com/TaoLv)
Design Proposal,  
https://cwiki.apache.org/confluence/display/MXNET/Fused+RNN+Operators+for+CPU
BTW, LSTM is already merged, GRU is under the review.

2) distributed training by MPI AllReduce
Lead contributor : Patric Zhao (https://github.com/pengzhao-intel/), Ye 
zhouhai (https://github.com/threeleafzerg)
Design Proposal, 
https://cwiki.apache.org/confluence/display/MXNET/Extend+MXNet+Distributed+Training+by+MPI+AllReduce
BTW, we enabled the distributed training for awslabs/sockeye and the PR 
will be submitted soon.
Discussion in here (https://github.com/awslabs/sockeye/issues/397)

Feel free to let me know for any feedback and suggestions :)

BR,

--Patric



> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Thursday, May 31, 2018 1:17 AM
> To: dev@mxnet.incubator.apache.org
> Subject: summary of 5/29 v1.2 post mortem & next release brainstorming call
> 
> Thanks to everybody who attended the MXNet post mortem and release
> brainstorm call yesterday. We had 15 callers for the morning session and 3
> people in a room in the afternoon session.
> 
> Summary of the post mortem is captured here
>  Apache+MXNet+v1.2.0>.
> Please feel free to add and comment.
> Lots of good suggestions how to improve our CI process. Please contribute to
> the suggested improvements.
> 
> Proposal for the next release content and schedule is summarized here
>  ext+MXNet+Release>.
> Please review, refine and add links to design specs where needed.
> Taliesin from Wolfram Design offered to share experience writing good tests
> covering different data types and I think we all could learn from it and 
> improve
> the flaky tests. I will coordinate with Taliesin to setup a session.
> 
> 
> Q from the call:
> 
>- Taliesin: support for control flow, discuss on dev@, avoid unrolling
>of operators, enable reshaping of operators, PR was rejected, provide more
>details
>   - https://github.com/apache/incubator-mxnet/pull/8949
>- Anton: need to refine process for patch release (i.e. refinement of 
> Release
>Versioning and Branching
> 
>  +Branching>
> )
>- how and who decides about patch release
>- monthly call: yes (Haibin, Anton, Wolfram Research)
> 
> 
> I summarized the calls as well here
>  s> and will follow up with the suggestions.
> 
> Please provide feedback on preferred monthly call times, so we can minimize
> inconvenience while maximizing possible attendance. Please leave feedback at
> https://cwiki.apache.org/confluence/display/MXNET/Meetups+and+Hangouts
> or reply.
> 
> Time planer:
> https://www.timeanddate.com/worldclock/meetingtime.html?iso=20180530;
> p2=283=176=37=179=33
> Proposal
> UTC 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> 15
> 16
> 17
> 18
> 19
> 20
> 21
> 22
> 23
> Vote count (add name or X)
> A
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> X
> 
> 
> 
> 
> 
> 
> B
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> X
> 
> 
> 
> 
> 
> 
> 
> C
> 
> 
> 
> X
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> D
> 
> 
> X
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> add as needed
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Steffen

RE: MXNet meetup in Seattle April 24th

2018-04-16 Thread Zhao, Patric

+1 for providing the offline materials :)

> -Original Message-
> From: Hen [mailto:bay...@apache.org]
> Sent: Tuesday, April 17, 2018 12:09 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: MXNet meetup in Seattle April 24th
> 
> Will there be a write up/presentation upload for those unable to attend?
> 
> On Sun, Apr 15, 2018 at 4:47 PM Steffen Rochel 
> wrote:
> 
> > All - updated agenda published at:
> > https://www.meetup.com/Apache-MXNet-Seattle-
> meetup/events/249178668/
> >
> > If you are interested and available, please RSVP if you haven't already.
> >
> > Steffen
> >

RE: blog for MXNet

2018-04-14 Thread Zhao, Patric

Yes, the English version will be very helpful.

And as I know, the content in WeChat is not searchable. 


> -Original Message-
> From: Hen [mailto:bay...@apache.org]
> Sent: Friday, April 13, 2018 10:50 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: blog for MXNet
> 
> Pretty sad that “intel mkdnn mxnet” doesn’t find that in a google search; but
> perhaps that says more about the fragmentation of the internet and my out
> of date expectations :)
> 
> Is that being translated over to English (recognizing the assumption that
> western developers speak English)?
> 
> On Wed, Apr 11, 2018 at 8:02 PM Yida Wang <yid...@gmail.com> wrote:
> 
> > We have a WeChat official account ApacheMXNet and this is the latest
> post:
> >
> >
> https://mp.weixin.qq.com/s?__biz=MzU3NjUyOTU0OA===2247483669
> =1
> >
> =f6217700a69c3d70a91593560b94fd42=fd1334f6ca64bde07e4d
> 9271fb5
> > 49b897996c8638f061bea545b7ab19fb10d0c31973f3544e8#rd
> >
> > Yida
> >
> > On Wed, Apr 11, 2018 at 7:55 PM, Aaron Markham
> > <aaron.s.mark...@gmail.com>
> > wrote:
> >
> > > I think for China we need to cross-post to WeChat. Apparently, there
> > > is already blog post activity for MXNet there, and it would make
> > > sense to be on that platform directly.
> > >
> > > Can a China user access the Apache blog site that Hen mentioned?
> > > Also, I'm not familiar with the Apache blog and how you contribute.
> > > I don't see info about it on Confluence or elsewhere. Certainly
> > > sounds like something that needs some attention and to be part of
> > > regular communications.
> > >
> > >
> > > On Wed, Apr 11, 2018 at 6:21 PM, Zhao, Patric
> > > <patric.z...@intel.com>
> > > wrote:
> > > > FYI, China user can't access medium.com :(
> > > >
> > > >> -Original Message-
> > > >> From: Anirudh Acharya [mailto:anirudhk...@gmail.com]
> > > >> Sent: Thursday, April 12, 2018 6:31 AM
> > > >> To: dev@mxnet.incubator.apache.org
> > > >> Subject: Re: blog for MXNet
> > > >>
> > > >> There is already an AWS Evangelist, Julien Simon, who has quite a
> > > >> few
> > > posts
> > > >> about mxnet/gluon on medium - https://medium.com/@julsimon
> > > >>
> > > >>
> > > >> Regards
> > > >> Anirudh
> > > >>
> > > >> On Wed, Apr 11, 2018 at 3:27 PM, Sebastian Gutierrez <
> > > >> sebast...@aiworkbox.com> wrote:
> > > >>
> > > >> > Aaron and Thomas
> > > >> >
> > > >> > Great ideas!
> > > >> >
> > > >> > One thing worth also considering is something like
> > > >> >
> > > >> > https://www.r-bloggers.com/
> > > >> >
> > > >> > What it does is serve as a blog aggregation service for all of
> > > >> > the people who have blogged about r topics. Because of the
> > > >> > central repository nature, it serves as a natural gathering
> > > >> > point and allows people not using RSS (or similar technologies)
> > > >> > to keep up to date
> > > with what is
> > > >> happening.
> > > >> >
> > > >> > Another thing worth considering is a job board / site for MXNet
> > > >> > full time / part time / remote jobs.  The data vis community
> > > >> > has this
> > free
> > > >> > email list service
> > > >> > https://groups.google.com/forum/m/#!forum/data-vis-jobs that's
> > > >> > very community friendly and is a good place for people to
> > > >> > gather to see
> > job
> > > >> > needs.
> > > >> >
> > > >> > All the best
> > > >> > Sebastian Gutierrez
> > > >> >
> > > >> >
> > > >> > On Wed, Apr 11, 2018 at 6:10 PM Thomas DELTEIL
> > > >> > <thomas.delte...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Thanks Aaron, I like medium, a lot of projects seems to be
> > > >> > > posting their articles there, as you mentioned.
> > > >> > >
> > > >> > > Note that there is a newly created Chinese MXNet blog here:
> > > >&g

RE: blog for MXNet

2018-04-11 Thread Zhao, Patric

FYI, China user can't access medium.com :( 

> -Original Message-
> From: Anirudh Acharya [mailto:anirudhk...@gmail.com]
> Sent: Thursday, April 12, 2018 6:31 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: blog for MXNet
> 
> There is already an AWS Evangelist, Julien Simon, who has quite a few posts
> about mxnet/gluon on medium - https://medium.com/@julsimon
> 
> 
> Regards
> Anirudh
> 
> On Wed, Apr 11, 2018 at 3:27 PM, Sebastian Gutierrez <
> sebast...@aiworkbox.com> wrote:
> 
> > Aaron and Thomas
> >
> > Great ideas!
> >
> > One thing worth also considering is something like
> >
> > https://www.r-bloggers.com/
> >
> > What it does is serve as a blog aggregation service for all of the
> > people who have blogged about r topics. Because of the central
> > repository nature, it serves as a natural gathering point and allows
> > people not using RSS (or similar technologies) to keep up to date with what 
> > is
> happening.
> >
> > Another thing worth considering is a job board / site for MXNet full
> > time / part time / remote jobs.  The data vis community has this free
> > email list service
> > https://groups.google.com/forum/m/#!forum/data-vis-jobs that's very
> > community friendly and is a good place for people to gather to see job
> > needs.
> >
> > All the best
> > Sebastian Gutierrez
> >
> >
> > On Wed, Apr 11, 2018 at 6:10 PM Thomas DELTEIL
> > 
> > wrote:
> >
> > > Thanks Aaron, I like medium, a lot of projects seems to be posting
> > > their articles there, as you mentioned.
> > >
> > > Note that there is a newly created Chinese MXNet blog here:
> > > https://zh.mxnet.io/blog/
> > >
> > > I would be happy to contribute to the blogs, if you want to add me
> > > to the writer/editor list.
> > >
> > >
> > >
> > > Also, there is a mxnet subreddit r/mxnet which was created by
> > > Sebastian
> > > (thanks!) and I am now a moderator as well. Feel free to cross-post
> > > any interesting content there! https://www.reddit.com/r/mxnet/
> > > Please subscribe!
> > >
> > >
> > > I will try to post one link a day at least, until I run out of links
> > > ☺ We will also improve the look of it this week and add links to
> > > relevant resources on the side bar, etc.
> > >
> > >
> > >
> > > All the best,
> > >
> > >
> > > Thomas
> > >
> > >
> > > 2018-04-11 14:45 GMT-07:00 Aaron Markham
> :
> > >
> > > > Having a blog for MXNet would be very useful for conveying news,
> > > > talking about features, demoing applications, and building awareness.
> > > >
> > > > Does anyone have particular preferences or recommendations on blog
> > > > hosting or platform?
> > > >
> > > > I currently have editor access for an MXNet branded account on Medium.
> > > > https://medium.com/mxnet
> > > >
> > > > There's nothing there at the moment, but at least with Medium we
> > > > all could get started right away, and have a built-in syndication
> > > > platform. Also, note that this is where the TensorFlow blog resides:
> > > > https://medium.com/tensorflow
> > > >
> > > > Please make it known if you'd like to contribute, so you can get
> > > > writer/editor access (to whichever platform we settle on.)
> > > >
> > > > Cheers,
> > > > Aaron
> > > >
> > >
> >

RE: Extend MXNET distributed training with MPI AllReduce

2018-03-29 Thread Zhao, Patric

Actually, the current design structure is very like kvstore_nccl as attached 
picture shown.

I have updated the proposal into google doc as well. It’s more easy to add 
comments and modify.

https://docs.google.com/document/d/1e4anwDiS18cWP49FAghU6tqqdtnRKUcbNJJxvhIfvIA/edit#heading=h.t762l56r1094

Thanks,

--Patric


From: Ye, Zhouhai
Sent: Tuesday, March 27, 2018 4:30 PM
To: 'Nan Zhu' <zhunanmcg...@gmail.com>; 'dev@mxnet.incubator.apache.org' 
<dev@mxnet.incubator.apache.org>
Cc: 'Li, Mu' <m...@amazon.com>; Lv, Tao A <tao.a...@intel.com>; Ma, Guokai 
<guokai...@intel.com>; 'Rahul Huilgol' <rahulhuil...@gmail.com>; Ye, Jason Y 
<jason.y...@intel.com>; Zhang, Rong A <rong.a.zh...@intel.com>; Zhao, Patric 
<patric.z...@intel.com>
Subject: RE: Extend MXNET distributed training with MPI AllReduce

For our current POC:
b. Add mpi.kvstore in python. It depends upon mxnet submodule mpi_collectives 
(new). (mpi_collectives is c++ library depending upon mxnet.)(Add new type 
of kvstore in python layer.)

mpi_collectives doesn’t need to be a single c++ library. It’s source code can 
be compiled into libmxnet.so.


From: Ye, Zhouhai
Sent: Tuesday, March 27, 2018 11:21 AM
To: Nan Zhu <zhunanmcg...@gmail.com<mailto:zhunanmcg...@gmail.com>>; 
dev@mxnet.incubator.apache.org<mailto:dev@mxnet.incubator.apache.org>
Cc: Li, Mu <m...@amazon.com<mailto:m...@amazon.com>>; Lv, Tao A 
<tao.a...@intel.com<mailto:tao.a...@intel.com>>; Ma, Guokai 
<guokai...@intel.com<mailto:guokai...@intel.com>>; Rahul Huilgol 
<rahulhuil...@gmail.com<mailto:rahulhuil...@gmail.com>>; Ye, Jason Y 
<jason.y...@intel.com<mailto:jason.y...@intel.com>>; Zhang, Rong A 
<rong.a.zh...@intel.com<mailto:rong.a.zh...@intel.com>>; Zhao, Patric 
<patric.z...@intel.com<mailto:patric.z...@intel.com>>
Subject: RE: Extend MXNET distributed training with MPI AllReduce

You can check mpi.kvstore API Spec in our design doc:

e.g.  We add pushpull and broadcast interface and disable original push and 
pull in new kvstore.

From: Ye, Zhouhai
Sent: Tuesday, March 27, 2018 11:18 AM
To: 'Nan Zhu' <zhunanmcg...@gmail.com<mailto:zhunanmcg...@gmail.com>>; 
dev@mxnet.incubator.apache.org<mailto:dev@mxnet.incubator.apache.org>
Cc: Li, Mu <m...@amazon.com<mailto:m...@amazon.com>>; Lv, Tao A 
<tao.a...@intel.com<mailto:tao.a...@intel.com>>; Ma, Guokai 
<guokai...@intel.com<mailto:guokai...@intel.com>>; Rahul Huilgol 
<rahulhuil...@gmail.com<mailto:rahulhuil...@gmail.com>>; Ye, Jason Y 
<jason.y...@intel.com<mailto:jason.y...@intel.com>>; Zhang, Rong A 
<rong.a.zh...@intel.com<mailto:rong.a.zh...@intel.com>>; Zhao, Patric 
<patric.z...@intel.com<mailto:patric.z...@intel.com>>
Subject: RE: Extend MXNET distributed training with MPI AllReduce

Hi,
Nan Zhu

As we described in our design doc, there’s two possible code structure 
(implementation) : (currently we implement second in our POC)


a.   Implement mpi.kvstore same level as the current kvstores (CPP 
src/kvstore)   (Adhere to original kvstore factory pattern)



b.  Add mpi.kvstore in python. It depends upon mxnet submodule 
mpi_collectives (new). (mpi_collectives is c++ library depending upon mxnet.)   
 (Add new type of kvstore in python layer.)


For your second question, I think to make a single communication submodule is 
OK (just like a.). But an unified abstraction for both PS and Allreduce is very 
hard.


From: Nan Zhu [mailto:zhunanmcg...@gmail.com]
Sent: Tuesday, March 27, 2018 10:39 AM
To: dev@mxnet.incubator.apache.org<mailto:dev@mxnet.incubator.apache.org>
Cc: Li, Mu <m...@amazon.com<mailto:m...@amazon.com>>; Lv, Tao A 
<tao.a...@intel.com<mailto:tao.a...@intel.com>>; Ma, Guokai 
<guokai...@intel.com<mailto:guokai...@intel.com>>; Rahul Huilgol 
<rahulhuil...@gmail.com<mailto:rahulhuil...@gmail.com>>; Ye, Jason Y 
<jason.y...@intel.com<mailto:jason.y...@intel.com>>; Ye, Zhouhai 
<zhouhai...@intel.com<mailto:zhouhai...@intel.com>>; Zhang, Rong A 
<rong.a.zh...@intel.com<mailto:rong.a.zh...@intel.com>>; Zhao, Patric 
<patric.z...@intel.com<mailto:patric.z...@intel.com>>
Subject: Re: Extend MXNET distributed training with MPI AllReduce

Hi, Patric

It's pretty nice work!

A question:

how the future code structure would look like when putting this allreduce 
module as an submodule? We will have two communication submodules?

Is there any plan to give an unified abstraction for communication so that a 
single communication submodule is possible?

Best,

Nan


On Mon, Mar 26, 2018 at 7:20 PM, Chris Olivier 
<cjolivie...@gmail.com<mailto:cjolivie...@gmail.com>> wrote:
great! nice work!

On Mon, Mar 26, 2018 at 6:31 PM Zhao, Patric 
&

Extend MXNET distributed training with MPI AllReduce

2018-03-26 Thread Zhao, Patric

Hi MXNET owners/developers,

As you known, the AllReduce and Parameter Severs are two very popular 
distributed training modes in DL.

Currently, MXNET only supports parameter server mode and is lack of AllReduce 
mode. Other frameworks, like tensorflow, pytorch, caffe, etc, can work with 
AllReduce.
Based on our analysis and experiments, AllReduce mode can achieves the better 
scalability and more efficiency

So, we propose to extend MXNET distributed training with MPI AllReduce mode.
We have implemented a AllReduce prototype in MXNET and the results are very 
positive.
AllReduce mode can get 94.7% scale efficiency by 8 compute nodes for VGG16 
while the Parameter Server requires totally 16 nodes (8 compute nodes + 8 
parameter severs) to reach 93.2%.

The whole proposal is available in MXNET wiki. Any feedback are highly 
appreciated.
https://cwiki.apache.org/confluence/display/MXNET/Extend+MXNet+Distributed+Training+by+MPI+AllReduce

Thanks in advance.

BR,

--Patric

RE: call for contributions to next MXNet release

2018-03-23 Thread Zhao, Patric

Really thank Marco, Da and other reviewers' help :)  

I'd like to update the status of MKL-DNN bugs. 
Feel free to let me know if there're any other issues.

There're 8 opened issues (1 discussion thread is not included).

2 issues are WIP and will be completed in next several days.
#10189, Race condition when MKLDNN is enabled MKL, Zheng Da  
Note: this data race exist in theory but couldn't happen now. Da is working on 
the enhancement to make it perfect.
-#8712, test_operator.test_depthwise_convolution fails in [Python2: MKLML-CPU] 
and [Python3: MKLML-CPU], Patric Zhao

6 opened issues have been resolved and wait to review/merge/close
#10168 (PR#10218) CPP test case fails to compile using gcc 4.8.5 when MKLDNN 
enabled Breaking Build MKL
#10141 (Passed, to be closed)  Flaky test 
test_operator_gpu.test_convolution_options @ Python2: MKLDNN-GPU Flaky MKL Test
#10092 (PR#10021) No "full" pooling convention support with MKL-DNN Feature 
request MKL Operator
#10026 (PR#10069) MXNET_MKLDNN_DEBUG=1 produces errors Bug MKL
#8974 (Question, will update doc) MXNET compatibility with MKL libraries 
bundled in Microsoft R Open MKL
#8532 (resolved by workaround) mxnet-mkl (v0.12.0) crash when using 
(conda-installed) numpy with MKL Bug MKL
Note: The duplicated libiomp.so library caused the conflict in the conda and 
it's not introduced by MKL-DNN. 
It can be resolved by environment variable. I am looking for C API 
to set in the runtime.

> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com]
> Sent: Saturday, March 17, 2018 1:37 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: call for contributions to next MXNet release
> 
> Hi Patric,
> 
> I have added three more issues:
> https://github.com/apache/incubator-mxnet/issues/10131
> https://github.com/apache/incubator-mxnet/issues/10133
> https://github.com/apache/incubator-mxnet/issues/10134
> 
> Also, you missed https://github.com/apache/incubator-mxnet/issues/8712 -
> sorry, it was not labelled before
> 
> Best regards,
> Marco
> 
> On Fri, Mar 16, 2018 at 7:45 AM, Zhao, Patric <patric.z...@intel.com>
> wrote:
> 
> > MKL issues summary:
> >
> > Feel free to let me know if anything I missed.
> >
> > Totally, there’re 11 open issues  in the github with the label of MKL
> > as
> > below:
> >  4 Bugs issues: #10092, #10086,  #10026, #8881
> >  4 Building issues: #9993, #9828, #8974, #8532
> >  1 Flaky Tests: #9864
> >  2 wrong label and invalidated:  #9844, #8874
> >
> > Current status.
> > 3 DONE, to be closed.
> > 4 WIP (working in progress)
> > 2 TODO
> > 2 are invalidated now.
> >
> > Details:
> > #10092, WIP, PR#10021
> > #10086, TODO
> > #10026, WIP
> > #9993, DONE (need to be closed), fixed by #10075 #9864, TODO #9844,
> > INVALIDED, not related w/ MKL #9828, DONE (need to be closed), fixed
> > by #9918 & #10115 #8974, WIP #8881, DONE (need to be closed), fixed by
> > #9112 #8874, INVALIDED, “I believe, we are not using MKL”
> > #8532, WIP, library conflict and can be resolved by environment
> > setting
> >
> > Lists:
> > No "full" pooling convention support with MKL-DNN Feature request MKL
> > Operator
> > #10092 opened 2 days ago by marcoabreu
> > https://github.com/apache/incubator-mxnet/issues/10092
> >
> > [MXNET-84] Segfault test_autograd.test_unary_func @ Python3:
> > MKLDNN-CPU Bug MKL Test
> > #10086 opened 3 days ago by marcoabreu
> > https://github.com/apache/incubator-mxnet/issues/10086
> >
> > MXNET_MKLDNN_DEBUG=1 produces errors Bug MKL
> > #10026 opened 8 days ago by marcoabreu
> > https://github.com/apache/incubator-mxnet/issues/10026
> >
> > cmake cannot build mxnet Bug Build MKL
> > #9993 opened 11 days ago by jacky4323
> > https://github.com/apache/incubator-mxnet/issues/9993
> >
> > Flaky hanging test_operator.test_laop_3 @ Python3: MKLDNN-CPU Flaky
> > MKL Test
> > #9864 opened 21 days ago by marcoabreu
> > https://github.com/apache/incubator-mxnet/issues/9864
> >
> > Flaky test_operator_gpu.test_binary_op @ Python3: MKLDNN-GPU Flaky
> MKL
> > Test
> > #9844 opened 23 days ago by marcoabreu
> > https://github.com/apache/incubator-mxnet/issues/9844
> >
> > Building with MKL fails on OSX Build MKL
> > #9828 opened 25 days ago by sbodenstein
> > https://github.com/apache/incubator-mxnet/issues/9828
> >
> > MXNET compatibility with MKL libraries bundled in Microsoft R Open MKL
> > #8974 opened on Dec 7, 2017 by mjmg
> > https://github.com/apache/incubator-mxnet/issues/8974
>

MKLDNN Building Discussion in MXNET

2018-03-20 Thread Zhao, Patric

Hi MXNET developers,

Since the MKL-DNN is integrated into MXNET master in the last month, we saw 
there're some confusions about how to build the MKL-DNN and Intel MKL into 
MXNET.
And several Github issues were opened and most of them have been fixed. But I 
think we still need to define a clear flow for the further developments.

We start a thread to discuss what's the best building flow. A slide is attached 
in the GITHUB issues.
https://github.com/apache/incubator-mxnet/issues/10175

Highly appreciate for any suggestions and comments (or PR).

BR,

Thanks,

--Patric

RE: call for contributions to next MXNet release

2018-03-16 Thread Zhao, Patric

MKL issues summary:

Feel free to let me know if anything I missed.

Totally, there’re 11 open issues  in the github with the label of MKL as below:
 4 Bugs issues: #10092, #10086,  #10026, #8881
 4 Building issues: #9993, #9828, #8974, #8532
 1 Flaky Tests: #9864
 2 wrong label and invalidated:  #9844, #8874

Current status. 
3 DONE, to be closed. 
4 WIP (working in progress)
2 TODO
2 are invalidated now.

Details: 
#10092, WIP, PR#10021 
#10086, TODO
#10026, WIP
#9993, DONE (need to be closed), fixed by #10075
#9864, TODO 
#9844, INVALIDED, not related w/ MKL
#9828, DONE (need to be closed), fixed by #9918 & #10115
#8974, WIP
#8881, DONE (need to be closed), fixed by #9112
#8874, INVALIDED, “I believe, we are not using MKL”
#8532, WIP, library conflict and can be resolved by environment setting

Lists:
No "full" pooling convention support with MKL-DNN Feature request MKL Operator
#10092 opened 2 days ago by marcoabreu 
https://github.com/apache/incubator-mxnet/issues/10092
 
[MXNET-84] Segfault test_autograd.test_unary_func @ Python3: MKLDNN-CPU Bug MKL 
Test
#10086 opened 3 days ago by marcoabreu 
https://github.com/apache/incubator-mxnet/issues/10086

MXNET_MKLDNN_DEBUG=1 produces errors Bug MKL
#10026 opened 8 days ago by marcoabreu 
https://github.com/apache/incubator-mxnet/issues/10026
 
cmake cannot build mxnet Bug Build MKL
#9993 opened 11 days ago by jacky4323 
https://github.com/apache/incubator-mxnet/issues/9993
 
Flaky hanging test_operator.test_laop_3 @ Python3: MKLDNN-CPU Flaky MKL Test
#9864 opened 21 days ago by marcoabreu 
https://github.com/apache/incubator-mxnet/issues/9864

Flaky test_operator_gpu.test_binary_op @ Python3: MKLDNN-GPU Flaky MKL Test
#9844 opened 23 days ago by marcoabreu 
https://github.com/apache/incubator-mxnet/issues/9844

Building with MKL fails on OSX Build MKL
#9828 opened 25 days ago by sbodenstein 
https://github.com/apache/incubator-mxnet/issues/9828

MXNET compatibility with MKL libraries bundled in Microsoft R Open MKL
#8974 opened on Dec 7, 2017 by mjmg 
https://github.com/apache/incubator-mxnet/issues/8974

Intel MKL FATAL ERROR: Cannot load libmkl_avx512_mic.so or libmkl_def.so. Bug 
Data-loading MKL
#8881 opened on Nov 30, 2017 by wuzhijiexia 
https://github.com/apache/incubator-mxnet/issues/8881

mxnet installation from source: C++ linkage error on HPC C++ Installation MKL
#8874 opened on Nov 30, 2017 by jerrin92 
https://github.com/apache/incubator-mxnet/issues/8874

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL Bug MKL
#8532 opened on Nov 3, 2017 by fhieber
https://github.com/apache/incubator-mxnet/issues/8532


> -Original Message-
> From: Zheng, Da [mailto:dzz...@amazon.com]
> Sent: Tuesday, March 13, 2018 2:06 AM
> To: Zhao, Patric <patric.z...@intel.com>; steffenroc...@gmail.com;
> dev@mxnet.incubator.apache.org
> Cc: marco.g.ab...@googlemail.com; Ye, Jason Y <jason.y...@intel.com>
> Subject: Re: call for contributions to next MXNet release
> 
> Patric, thanks for summarizing open issues.
> 
> Since I'm in the Palo Alto office, maybe we don't need to wait until Apr 24th?
> 
> Best,
> Da
> 
> On 3/11/18, 6:43 AM, "Zhao, Patric" <patric.z...@intel.com> wrote:
> 
> >- should we talk about the package at the Apr 24th meetup in Seattle?
> We're based in Shanghai, China; so maybe @Da Zheng?

RE: MKLDNN Build (pre: call for contributions to next MXNet release)

2018-03-15 Thread Zhao, Patric

+source data to read clearly.

> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Friday, March 16, 2018 8:29 AM
> To: 'dev@mxnet.incubator.apache.org' <dev@mxnet.incubator.apache.org>
> Cc: Huang, Jin1 <jin1.hu...@intel.com>; Da Zheng <zhengda1...@gmail.com>
> Subject: RE: MKLDNN Build (pre: call for contributions to next MXNet release)
> 
> Hi Pedro,
> 
> 
> 
> We and Zheng Da tested the performance on Mac laptop (Intel i7 CPU) as
> below table.
> 
> The test script is example/image-classification/benchmark_score.py.
> 
> 
> 
> MKLDNN shows better performance in MAC too.
> 
> 
> topo
> 
> batch size
> 
> gcc7 MKLDNN=1
> 
> gcc7 MKLDNN=0
> 
> speedup
> 
> alexnet
> 
>  1
> 
> 38.44
> 
> 24.49
> 
> 157%
> 
>  2
> 
> 50.18
> 
> 29.62
> 
> 169%
> 
>  4
> 
> 59.54
> 
> 35.40
> 
> 168%
> 
>  8
> 
> 70.75
> 
> 32.49
> 
> 218%
> 
> 16
> 
> 73.40
> 
> 34.93
> 
> 210%
> 
> 32
> 
> 47.52
> 
> 37.19
> 
> 128%
> 
> vgg16
> 
>  1
> 
> 3.80
> 
> 2.13
> 
> 178%
> 
>  2
> 
> 3.52
> 
> 2.93
> 
> 120%
> 
>  4
> 
> 4.54
> 
> 3.01
> 
> 151%
> 
>  8
> 
> 5.12
> 
> 3.05
> 
> 168%
> 
> 16
> 
> 5.37
> 
> 3.12
> 
> 172%
> 
> 32
> 
> 5.36
> 
> 3.07
> 
> 175%
> 
> inception-bn
> 
>  1
> 
> 15.02
> 
> 14.04
> 
> 107%
> 
>  2
> 
> 25.32
> 
> 15.14
> 
> 167%
> 
>  4
> 
> 34.92
> 
> 14.97
> 
> 233%
> 
>  8
> 
> 27.96
> 
> 15.02
> 
> 186%
> 
> 16
> 
> 27.05
> 
> 15.10
> 
> 179%
> 
> 32
> 
> 32.29
> 
> 15.46
> 
> 209%
> 
> inception v3
> 
>  1
> 
> 10.03
> 
> 5.89
> 
> 170%
> 
>  2
> 
> 11.21
> 
> 5.99
> 
> 187%
> 
>  4
> 
> 13.04
> 
> 5.89
> 
> 221%
> 
>  8
> 
> 12.97
> 
> 6.01
> 
> 216%
> 
> 16
> 
> 12.55
> 
> 5.13
> 
> 245%
> 
> 32
> 
> 12.52
> 
> 5.65
> 
> 222%
> 
> resnet-50
> 
>  1
> 
> 15.53
> 
> 7.07
> 
> 220%
> 
>  2
> 
> 14.85
> 
> 7.06
> 
> 210%
> 
>  4
> 
> 17.08
> 
> 7.88
> 
> 217%
> 
>  8
> 
> 16.91
> 
> 8.97
> 
> 189%
> 
> 16
> 
> 15.55
> 
> 9.42
> 
> 165%
> 
> 32
> 
> 16.60
> 
> 7.99
> 
> 208%
> 
> resnet-152
> 
>  1
> 
> 5.90
> 
> 3.55
> 
> 166%
> 
>  2
> 
> 5.64
> 
> 3.23
> 
> 175%
> 
>  4
> 
> 6.11
> 
> 3.08
> 
> 198%
> 
>  8
> 
> 5.22
> 
> 3.58
> 
> 146%
> 
> 16
> 
> 5.28
> 
> 3.86
> 
> 137%
> 
> 32
> 
> 6.20
> 
> 3.72
> 
> 167%
> 
> 
> 
> 
> 
> 
> > -Original Message-
> 
> > From: Zhao, Patric
> 
> > Sent: Wednesday, March 14, 2018 9:33 PM
> 
> > To: dev@mxnet.incubator.apache.org
> 
> > Cc: Huang, Jin1 <jin1.hu...@intel.com>
> 
> > Subject: MKLDNN Build (pre: call for contributions to next MXNet release)
> 
> >
> 
> > Hi Pedro,
> 
> >
> 
> > Thanks for the good suggestions. I plan to update the install page but a 
> > slight
> 
> > delay by other things.
> 
> >
> 
> > Actually, I didn't test the performance on OSX since we're lack of Mac
> machine.
> 
> >
> 
> > I will try if I can get a Mac in the office.
> 
> >
> 
> > Thanks,
> 
> >
> 
> > --Patric
> 
> >
> 
> >
> 
> > > -Original Message-
> 
> > > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> 
> > > Sent: Wednesday, March 14, 2018 9:09 PM
> 
> > > To: dev@mxnet.incubator.apache.org
> 
> > > Subject: Re: call for contributions to next MXNet release
> 
> > >
> 
> > > Hi Patric
> 
> > >
> 
> > > Does it make sense to add instructions to link against mkl in the install 
> > > docs?
> 
> > >
> 
> > > For example in OSX you can use MKL by downloading
> 
> > > https://github.com/intel/mkl-dnn/releases and adding MKL_RT_LIBRARY
> 
> > > and MKL_INCLUDE_DIR plus  -DUSE_MKL_IF_AVAILABLE=ON -
> 
> > > DUSE_MKLML_MKL=ON -DUSE_MKLDNN=ON
> 
> > >
> 
> > > This is not documented in the website.
> 
&g

RE: MKLDNN Build (pre: call for contributions to next MXNet release)

2018-03-15 Thread Zhao, Patric

Hi Pedro,



We and Zheng Da tested the performance on Mac laptop (Intel i7 CPU) as below 
table.

The test script is example/image-classification/benchmark_score.py.



MKLDNN shows better performance in MAC too.


topo

batch size

gcc7 MKLDNN=1

gcc7 MKLDNN=0

speedup

alexnet

 1

38.44

24.49

157%

 2

50.18

29.62

169%

 4

59.54

35.40

168%

 8

70.75

32.49

218%

16

73.40

34.93

210%

32

47.52

37.19

128%

vgg16

 1

3.80

2.13

178%

 2

3.52

2.93

120%

 4

4.54

3.01

151%

 8

5.12

3.05

168%

16

5.37

3.12

172%

32

5.36

3.07

175%

inception-bn

 1

15.02

14.04

107%

 2

25.32

15.14

167%

 4

34.92

14.97

233%

 8

27.96

15.02

186%

16

27.05

15.10

179%

32

32.29

15.46

209%

inception v3

 1

10.03

5.89

170%

 2

11.21

5.99

187%

 4

13.04

5.89

221%

 8

12.97

6.01

216%

16

12.55

5.13

245%

32

12.52

5.65

222%

resnet-50

 1

15.53

7.07

220%

 2

14.85

7.06

210%

 4

17.08

7.88

217%

 8

16.91

8.97

189%

16

15.55

9.42

165%

32

16.60

7.99

208%

resnet-152

 1

5.90

3.55

166%

 2

5.64

3.23

175%

 4

6.11

3.08

198%

 8

5.22

3.58

146%

16

5.28

3.86

137%

32

6.20

3.72

167%






> -Original Message-

> From: Zhao, Patric

> Sent: Wednesday, March 14, 2018 9:33 PM

> To: dev@mxnet.incubator.apache.org

> Cc: Huang, Jin1 <jin1.hu...@intel.com>

> Subject: MKLDNN Build (pre: call for contributions to next MXNet release)

>

> Hi Pedro,

>

> Thanks for the good suggestions. I plan to update the install page but a 
> slight

> delay by other things.

>

> Actually, I didn't test the performance on OSX since we're lack of Mac 
> machine.

>

> I will try if I can get a Mac in the office.

>

> Thanks,

>

> --Patric

>

>

> > -Original Message-

> > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]

> > Sent: Wednesday, March 14, 2018 9:09 PM

> > To: dev@mxnet.incubator.apache.org

> > Subject: Re: call for contributions to next MXNet release

> >

> > Hi Patric

> >

> > Does it make sense to add instructions to link against mkl in the install 
> > docs?

> >

> > For example in OSX you can use MKL by downloading

> > https://github.com/intel/mkl-dnn/releases and adding MKL_RT_LIBRARY

> > and MKL_INCLUDE_DIR plus  -DUSE_MKL_IF_AVAILABLE=ON -

> > DUSE_MKLML_MKL=ON -DUSE_MKLDNN=ON

> >

> > This is not documented in the website.

> >

> > What's the performance increase of using MKL in osx when running in

> > CPU mode?

> >

> > Pedro

> >

> > On Wed, Mar 14, 2018 at 1:54 PM, Zhao, Patric <patric.z...@intel.com>

> > wrote:

> >

> > > My fault, typo for your name, Larroy  :(

> > >

> > > > -Original Message-

> > > > From: Zhao, Patric [mailto:patric.z...@intel.com]

> > > > Sent: Wednesday, March 14, 2018 8:40 PM

> > > > To: dev@mxnet.incubator.apache.org; Huang, Jin1

> > > > <jin1.hu...@intel.com>

> > > > Subject: RE: call for contributions to next MXNet release

> > > >

> > > > Thanks, Larryo.

> > > >

> > > > This is a PR to fix this issue,

> > > > https://github.com/apache/incubator-

> > > > mxnet/pull/10075

> > > > The issue will be gone after this PR is merged.

> > > >

> > > > For the whole logic, we are investigating now and maybe propose a

> > > > new approach for Makefile and CMake.

> > > >

> > > > > -Original Message-

> > > > > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]

> > > > > Sent: Wednesday, March 14, 2018 8:31 PM

> > > > > To: dev@mxnet.incubator.apache.org

> > > > > Subject: Re: call for contributions to next MXNet release

> > > > >

> > > > > Hi

> > > > >

> > > > > Can you please fix the build logic?

> > > > > https://github.com/apache/incubator-

> > > > > mxnet/blob/master/CMakeLists.txt#L158

> > > > >

> > > > > https://github.com/apache/incubator-mxnet/issues/10072

> > > > >

> > > > > It wrongly assumes that you have MKL installed, MKLDNN needs to

> > > > > check if MKL is available before, or be disabled by default,

> > > > > also in non

> > > intel

> > > > platforms.

> > > > >

> > > > > Pedro

> > > > >

> > > > > On Sun, Mar 11, 2018 at 2:43 PM, Zhao, Patric

> > > > > <patric.z...@intel.com>

> > >

MKLDNN Build (pre: call for contributions to next MXNet release)

2018-03-14 Thread Zhao, Patric

Hi Pedro,

Thanks for the good suggestions. I plan to update the install page but a slight 
delay by other things.

Actually, I didn't test the performance on OSX since we're lack of Mac machine.

I will try if I can get a Mac in the office.

Thanks,

--Patric


> -Original Message-
> From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> Sent: Wednesday, March 14, 2018 9:09 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: call for contributions to next MXNet release
> 
> Hi Patric
> 
> Does it make sense to add instructions to link against mkl in the install 
> docs?
> 
> For example in OSX you can use MKL by downloading
> https://github.com/intel/mkl-dnn/releases and adding MKL_RT_LIBRARY and
> MKL_INCLUDE_DIR plus  -DUSE_MKL_IF_AVAILABLE=ON -
> DUSE_MKLML_MKL=ON -DUSE_MKLDNN=ON
> 
> This is not documented in the website.
> 
> What's the performance increase of using MKL in osx when running in CPU
> mode?
> 
> Pedro
> 
> On Wed, Mar 14, 2018 at 1:54 PM, Zhao, Patric <patric.z...@intel.com>
> wrote:
> 
> > My fault, typo for your name, Larroy  :(
> >
> > > -Original Message-
> > > From: Zhao, Patric [mailto:patric.z...@intel.com]
> > > Sent: Wednesday, March 14, 2018 8:40 PM
> > > To: dev@mxnet.incubator.apache.org; Huang, Jin1
> > > <jin1.hu...@intel.com>
> > > Subject: RE: call for contributions to next MXNet release
> > >
> > > Thanks, Larryo.
> > >
> > > This is a PR to fix this issue, https://github.com/apache/incubator-
> > > mxnet/pull/10075
> > > The issue will be gone after this PR is merged.
> > >
> > > For the whole logic, we are investigating now and maybe propose a
> > > new approach for Makefile and CMake.
> > >
> > > > -Original Message-
> > > > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> > > > Sent: Wednesday, March 14, 2018 8:31 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Subject: Re: call for contributions to next MXNet release
> > > >
> > > > Hi
> > > >
> > > > Can you please fix the build logic?
> > > > https://github.com/apache/incubator-
> > > > mxnet/blob/master/CMakeLists.txt#L158
> > > >
> > > > https://github.com/apache/incubator-mxnet/issues/10072
> > > >
> > > > It wrongly assumes that you have MKL installed, MKLDNN needs to
> > > > check if MKL is available before, or be disabled by default, also
> > > > in non
> > intel
> > > platforms.
> > > >
> > > > Pedro
> > > >
> > > > On Sun, Mar 11, 2018 at 2:43 PM, Zhao, Patric
> > > > <patric.z...@intel.com>
> > > > wrote:
> > > >
> > > > > Steffen, really thanks for the helps and I totally agree with
> > > > > you and Marco's suggestions.
> > > > > I will summarize the open issues and status for the review soon.
> > > > >
> > > > > > What is your github handle?
> > > > > Github ID:  Patric Zhao, https://github.com/pengzhao-intel/ Da
> > Zheng,
> > > > > https://github.com/zheng-da/
> > > > >
> > > > > > - do you have design docs we can link to? Does the doc cover
> > > > > > packaging,
> > > > > use
> > > > > > of MXNet with full MKL (I understand there are license issues,
> > > > > > but we do
> > > > > have
> > > > > > users who are or plan to use MXNet with the complete MKL
> > > > > > package, not
> > > > > just
> > > > > > DNN subset)?
> > > > > Design doc:
> > > > > https://cwiki.apache.org/confluence/display/MXNET/The+
> > > > > design+of+MKLDNN+integration
> > > > > It's a good suggestion for the full MKL package and we will add
> > > > > the related part in the doc.
> > > > >
> > > > > > - do you have performance measurements (or plan to measure) to
> > > > > > include in release notes?
> > > > > Yes, we have tested the performance on all C4/C5 instances
> > > > > (8x,4x,2x,xlarge,large). And the accuracy are verified as well.
> > > > > We plan to publish these data both in the release note and mxnet
> > > > > website ( https://mxnet.incubator.apache.org/faq/perf.html).
> > > > >
> > > > > > - should we talk

RE: call for contributions to next MXNet release

2018-03-14 Thread Zhao, Patric

My fault, typo for your name, Larroy  :(  

> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Wednesday, March 14, 2018 8:40 PM
> To: dev@mxnet.incubator.apache.org; Huang, Jin1 <jin1.hu...@intel.com>
> Subject: RE: call for contributions to next MXNet release
> 
> Thanks, Larryo.
> 
> This is a PR to fix this issue, https://github.com/apache/incubator-
> mxnet/pull/10075
> The issue will be gone after this PR is merged.
> 
> For the whole logic, we are investigating now and maybe propose a new
> approach for Makefile and CMake.
> 
> > -Original Message-
> > From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> > Sent: Wednesday, March 14, 2018 8:31 PM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: call for contributions to next MXNet release
> >
> > Hi
> >
> > Can you please fix the build logic?
> > https://github.com/apache/incubator-
> > mxnet/blob/master/CMakeLists.txt#L158
> >
> > https://github.com/apache/incubator-mxnet/issues/10072
> >
> > It wrongly assumes that you have MKL installed, MKLDNN needs to check
> > if MKL is available before, or be disabled by default, also in non intel
> platforms.
> >
> > Pedro
> >
> > On Sun, Mar 11, 2018 at 2:43 PM, Zhao, Patric <patric.z...@intel.com>
> > wrote:
> >
> > > Steffen, really thanks for the helps and I totally agree with you
> > > and Marco's suggestions.
> > > I will summarize the open issues and status for the review soon.
> > >
> > > > What is your github handle?
> > > Github ID:  Patric Zhao, https://github.com/pengzhao-intel/ Da Zheng,
> > > https://github.com/zheng-da/
> > >
> > > > - do you have design docs we can link to? Does the doc cover
> > > > packaging,
> > > use
> > > > of MXNet with full MKL (I understand there are license issues, but
> > > > we do
> > > have
> > > > users who are or plan to use MXNet with the complete MKL package,
> > > > not
> > > just
> > > > DNN subset)?
> > > Design doc:  https://cwiki.apache.org/confluence/display/MXNET/The+
> > > design+of+MKLDNN+integration
> > > It's a good suggestion for the full MKL package and we will add the
> > > related part in the doc.
> > >
> > > > - do you have performance measurements (or plan to measure) to
> > > > include in release notes?
> > > Yes, we have tested the performance on all C4/C5 instances
> > > (8x,4x,2x,xlarge,large). And the accuracy are verified as well.
> > > We plan to publish these data both in the release note and mxnet
> > > website ( https://mxnet.incubator.apache.org/faq/perf.html).
> > >
> > > > - should we talk about the package at the Apr 24th meetup in Seattle?
> > > We're based in Shanghai, China; so maybe @Da Zheng?
> > >
> > >
> > > > -Original Message-
> > > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > Sent: Sunday, March 11, 2018 6:11 PM
> > > > To: dev@mxnet.incubator.apache.org
> > > > Cc: marco.g.ab...@googlemail.com
> > > > Subject: Re: call for contributions to next MXNet release
> > > >
> > > > Patric - added MKL-DNN to the project list. What is your github handle?
> > > > I do agree with Marco that we need to resolve disable tests,
> > > > broken
> > > features
> > > > etc. Jira looks like the right answer to create a list of known issues.
> > > > Questions I do have:
> > > > - do you have design docs we can link to? Does the doc cover
> > > > packaging,
> > > use
> > > > of MXNet with full MKL (I understand there are license issues, but
> > > > we do
> > > have
> > > > users who are or plan to use MXNet with the complete MKL package,
> > > > not
> > > just
> > > > DNN subset)?
> > > > - do you have performance measurements (or plan to measure) to
> > > > include in release notes?
> > > > - should we talk about the package at the Apr 24th meetup in Seattle?
> > > >
> > > > Steffen
> > > >
> > > > On Sat, Mar 10, 2018 at 4:40 AM Zhao, Patric
> > > > <patric.z...@intel.com>
> > > > wrote:
> > > >
> > > > > Hi Marco,
> > > > >
> > > > > Thanks for the inputs.
> > > > >
> > &g

RE: call for contributions to next MXNet release

2018-03-14 Thread Zhao, Patric

Thanks, Larryo. 

This is a PR to fix this issue, 
https://github.com/apache/incubator-mxnet/pull/10075
The issue will be gone after this PR is merged.

For the whole logic, we are investigating now and maybe propose a new approach 
for Makefile and CMake.

> -Original Message-
> From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> Sent: Wednesday, March 14, 2018 8:31 PM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: call for contributions to next MXNet release
> 
> Hi
> 
> Can you please fix the build logic?
> https://github.com/apache/incubator-
> mxnet/blob/master/CMakeLists.txt#L158
> 
> https://github.com/apache/incubator-mxnet/issues/10072
> 
> It wrongly assumes that you have MKL installed, MKLDNN needs to check if
> MKL is available before, or be disabled by default, also in non intel 
> platforms.
> 
> Pedro
> 
> On Sun, Mar 11, 2018 at 2:43 PM, Zhao, Patric <patric.z...@intel.com>
> wrote:
> 
> > Steffen, really thanks for the helps and I totally agree with you and
> > Marco's suggestions.
> > I will summarize the open issues and status for the review soon.
> >
> > > What is your github handle?
> > Github ID:  Patric Zhao, https://github.com/pengzhao-intel/ Da Zheng,
> > https://github.com/zheng-da/
> >
> > > - do you have design docs we can link to? Does the doc cover
> > > packaging,
> > use
> > > of MXNet with full MKL (I understand there are license issues, but
> > > we do
> > have
> > > users who are or plan to use MXNet with the complete MKL package,
> > > not
> > just
> > > DNN subset)?
> > Design doc:  https://cwiki.apache.org/confluence/display/MXNET/The+
> > design+of+MKLDNN+integration
> > It's a good suggestion for the full MKL package and we will add the
> > related part in the doc.
> >
> > > - do you have performance measurements (or plan to measure) to
> > > include in release notes?
> > Yes, we have tested the performance on all C4/C5 instances
> > (8x,4x,2x,xlarge,large). And the accuracy are verified as well.
> > We plan to publish these data both in the release note and mxnet
> > website ( https://mxnet.incubator.apache.org/faq/perf.html).
> >
> > > - should we talk about the package at the Apr 24th meetup in Seattle?
> > We're based in Shanghai, China; so maybe @Da Zheng?
> >
> >
> > > -Original Message-
> > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > Sent: Sunday, March 11, 2018 6:11 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: marco.g.ab...@googlemail.com
> > > Subject: Re: call for contributions to next MXNet release
> > >
> > > Patric - added MKL-DNN to the project list. What is your github handle?
> > > I do agree with Marco that we need to resolve disable tests, broken
> > features
> > > etc. Jira looks like the right answer to create a list of known issues.
> > > Questions I do have:
> > > - do you have design docs we can link to? Does the doc cover
> > > packaging,
> > use
> > > of MXNet with full MKL (I understand there are license issues, but
> > > we do
> > have
> > > users who are or plan to use MXNet with the complete MKL package,
> > > not
> > just
> > > DNN subset)?
> > > - do you have performance measurements (or plan to measure) to
> > > include in release notes?
> > > - should we talk about the package at the Apr 24th meetup in Seattle?
> > >
> > > Steffen
> > >
> > > On Sat, Mar 10, 2018 at 4:40 AM Zhao, Patric <patric.z...@intel.com>
> > > wrote:
> > >
> > > > Hi Marco,
> > > >
> > > > Thanks for the inputs.
> > > >
> > > > MKL-DNN is just merged to the master branch in a month. I agree
> > > > that it's not very mature.
> > > > But, in my mind, there're NO major issues in the current
> > implementation.
> > > >
> > > > The previous data race (flaky test) issue is fixed by Zheng Da.
> > > > We have submitted the PRs to fix the building issues and setup
> > > > Clang CI environment, such as in OSX and Cmake.
> > > > Meanwhile, we are actively working on the MXNET for the
> > > > performance, functionality and usability improvements.
> > > >
> > > > More positively, I think we should summarize the open issues, and
> > > > we can focus on these issues you've mentioned.
> > > >
> > &

RE: call for contributions to next MXNet release

2018-03-11 Thread Zhao, Patric

Steffen, really thanks for the helps and I totally agree with you and Marco's 
suggestions. 
I will summarize the open issues and status for the review soon.

> What is your github handle?
Github ID:  Patric Zhao, https://github.com/pengzhao-intel/ Da Zheng, 
https://github.com/zheng-da/

> - do you have design docs we can link to? Does the doc cover packaging, use
> of MXNet with full MKL (I understand there are license issues, but we do have
> users who are or plan to use MXNet with the complete MKL package, not just
> DNN subset)?  
Design doc:  
https://cwiki.apache.org/confluence/display/MXNET/The+design+of+MKLDNN+integration
It's a good suggestion for the full MKL package and we will add the related 
part in the doc.

> - do you have performance measurements (or plan to measure) to include in
> release notes?
Yes, we have tested the performance on all C4/C5 instances 
(8x,4x,2x,xlarge,large). And the accuracy are verified as well.
We plan to publish these data both in the release note and mxnet website 
(https://mxnet.incubator.apache.org/faq/perf.html).
  
> - should we talk about the package at the Apr 24th meetup in Seattle?
We're based in Shanghai, China; so maybe @Da Zheng?  


> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Sunday, March 11, 2018 6:11 PM
> To: dev@mxnet.incubator.apache.org
> Cc: marco.g.ab...@googlemail.com
> Subject: Re: call for contributions to next MXNet release
> 
> Patric - added MKL-DNN to the project list. What is your github handle?
> I do agree with Marco that we need to resolve disable tests, broken features
> etc. Jira looks like the right answer to create a list of known issues.
> Questions I do have:
> - do you have design docs we can link to? Does the doc cover packaging, use
> of MXNet with full MKL (I understand there are license issues, but we do have
> users who are or plan to use MXNet with the complete MKL package, not just
> DNN subset)?
> - do you have performance measurements (or plan to measure) to include in
> release notes?
> - should we talk about the package at the Apr 24th meetup in Seattle?
> 
> Steffen
> 
> On Sat, Mar 10, 2018 at 4:40 AM Zhao, Patric <patric.z...@intel.com>
> wrote:
> 
> > Hi Marco,
> >
> > Thanks for the inputs.
> >
> > MKL-DNN is just merged to the master branch in a month. I agree that
> > it's not very mature.
> > But, in my mind, there're NO major issues in the current implementation.
> >
> > The previous data race (flaky test) issue is fixed by Zheng Da.
> > We have submitted the PRs to fix the building issues and setup Clang
> > CI environment, such as in OSX and Cmake.
> > Meanwhile, we are actively working on the MXNET for the performance,
> > functionality and usability improvements.
> >
> > More positively, I think we should summarize the open issues, and we
> > can focus on these issues you've mentioned.
> >
> > Thanks,
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com]
> > > Sent: Saturday, March 10, 2018 9:36 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: Re: call for contributions to next MXNet release
> > >
> > > Hello Patric,
> > >
> > > please be aware of the fact that there are still a lot of disabled
> > tests, open
> > > MKLDNN issues, broken features and flaky tests that have not been
> > > addressed yet. We agreed on merging MKLDNN for the time being to
> > > allow broad testing and that we will revisit the state a bit before
> > > the next
> > release.
> > > At the moment, I'd not be in favour of having MKLDNN being part of it.
> > >
> > > Best regards,
> > > Marco
> > >
> > > Zhao, Patric <patric.z...@intel.com> schrieb am Sa., 10. März 2018,
> > 02:30:
> > >
> > > > Hi Steffen,
> > > >
> > > > We'd like the MKL-DNN backend can be included in the next release.
> > > >
> > > > We (Intel engineers)  and Zheng-Da (AWS engineer)  can work on it.
> > > >
> > > > Could you help add the item in the table?
> > > >
> > > > Thanks,
> > > >
> > > > --Patric
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > > > Sent: Friday, March 9, 2018 9:29 PM
> > > > > To: dev@mxnet.incubator.apache.org
> > > > > Subject:

RE: call for contributions to next MXNet release

2018-03-10 Thread Zhao, Patric

Hi Marco,

Thanks for the inputs.

MKL-DNN is just merged to the master branch in a month. I agree that it's not 
very mature.
But, in my mind, there're NO major issues in the current implementation.

The previous data race (flaky test) issue is fixed by Zheng Da.
We have submitted the PRs to fix the building issues and setup Clang CI 
environment, such as in OSX and Cmake.
Meanwhile, we are actively working on the MXNET for the performance, 
functionality and usability improvements.

More positively, I think we should summarize the open issues, and we can focus 
on these issues you've mentioned.

Thanks,

--Patric

> -Original Message-
> From: Marco de Abreu [mailto:marco.g.ab...@googlemail.com]
> Sent: Saturday, March 10, 2018 9:36 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: call for contributions to next MXNet release
> 
> Hello Patric,
> 
> please be aware of the fact that there are still a lot of disabled tests, open
> MKLDNN issues, broken features and flaky tests that have not been
> addressed yet. We agreed on merging MKLDNN for the time being to allow
> broad testing and that we will revisit the state a bit before the next 
> release.
> At the moment, I'd not be in favour of having MKLDNN being part of it.
> 
> Best regards,
> Marco
> 
> Zhao, Patric <patric.z...@intel.com> schrieb am Sa., 10. März 2018, 02:30:
> 
> > Hi Steffen,
> >
> > We'd like the MKL-DNN backend can be included in the next release.
> >
> > We (Intel engineers)  and Zheng-Da (AWS engineer)  can work on it.
> >
> > Could you help add the item in the table?
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> > > Sent: Friday, March 9, 2018 9:29 PM
> > > To: dev@mxnet.incubator.apache.org
> > > Subject: call for contributions to next MXNet release
> > >
> > > Hi - I would like to propose the next MXNet release.
> > > Initial draft content is listed at MXNet wiki
> > > <https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals
> > > +fo
> > > r+next+MXNet+Release>.
> > > I would like to call to all contributors to add features you are
> > > working
> > on and
> > > would like to see included in the release. Suggested code freeze is
> > > March 30th  with target release by mid April.
> > >
> > > Anirudh Subramanian and Chris Olivier have volunteered to co-manage
> > > the release.
> > >
> > > Mark your date: we are planing public meetup on Apr 24th in Seattle.
> > > Details to follow.
> > >
> > > Regards,
> > > Steffen
> >

RE: call for contributions to next MXNet release

2018-03-09 Thread Zhao, Patric

Hi Steffen,

We'd like the MKL-DNN backend can be included in the next release.

We (Intel engineers)  and Zheng-Da (AWS engineer)  can work on it.

Could you help add the item in the table?

Thanks,

--Patric


> -Original Message-
> From: Steffen Rochel [mailto:steffenroc...@gmail.com]
> Sent: Friday, March 9, 2018 9:29 PM
> To: dev@mxnet.incubator.apache.org
> Subject: call for contributions to next MXNet release
> 
> Hi - I would like to propose the next MXNet release.
> Initial draft content is listed at MXNet wiki
>  r+next+MXNet+Release>.
> I would like to call to all contributors to add features you are working on 
> and
> would like to see included in the release. Suggested code freeze is March
> 30th  with target release by mid April.
> 
> Anirudh Subramanian and Chris Olivier have volunteered to co-manage the
> release.
> 
> Mark your date: we are planing public meetup on Apr 24th in Seattle.
> Details to follow.
> 
> Regards,
> Steffen

RE: Assign JIRA issue

2018-03-08 Thread Zhao, Patric

@Chris please add into group too.

> -Original Message-
> From: YiZhi Liu [mailto:liuyi...@apache.org]
> Sent: Friday, March 9, 2018 8:40 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Assign JIRA issue
> 
> Yes it is. Thanks!
> 
> 2018-03-08 16:04 GMT-08:00 Chris Olivier :
> > Done. Assume you meant your apache.org onew (there there two)
> >
> > On Thu, Mar 8, 2018 at 3:47 PM, YiZhi Liu  wrote:
> >
> >> Hi Chris,
> >>
> >> could you help to add me to the permission group (committer)?
> >>
> >> Thanks,
> >> Yizhi
> >>
> >> 2018-03-07 11:11 GMT-08:00 Chris Olivier :
> >> > yeah I fixed that
> >> >
> >> > On Wed, Mar 7, 2018 at 11:09 AM, Marco de Abreu <
> >> > marco.g.ab...@googlemail.com> wrote:
> >> >
> >> >> Would it be possible to make tickets assigned to Nobody by default?
> >> >>
> >> >> On Wed, Mar 7, 2018 at 6:15 PM, Chris Olivier
> >> >> 
> >> >> wrote:
> >> >>
> >> >> > Please assign JIRA issue to yourself after creating.  I think it
> >> >> > was assigning to me for some reason.  Just click "assign to me" link.
> >> >> >
> >> >> > If you don't have permissions, please email me directly (or even
> >> better,
> >> >> > ping on slack) and I will add you to a permission group (let me
> >> >> > know
> >> if
> >> >> > you're a committer or contributor).
> >> >> >
> >> >>
> >>
> >>
> >>
> >> --
> >> Yizhi Liu
> >> DMLC member
> >> Amazon Web Services
> >> Vancouver, Canada
> >>
> 
> 
> 
> --
> Yizhi Liu
> DMLC member
> Amazon Web Services
> Vancouver, Canada

Could anyone help add me into slack group?

2018-02-02 Thread Zhao, Patric

Thanks，

--Patric

RE: Intel Plan for the contribution to MXNET

2018-01-31 Thread Zhao, Patric

Thanks, Jun, please see my comments inline.

Wenting and Jin will follow up the tasks in the PR.

From: Jun Wu [mailto:wujun@gmail.com]
Sent: Thursday, February 1, 2018 12:40 PM
To: dev@mxnet.incubator.apache.org
Cc: Ye, Jason Y <jason.y...@intel.com>; Lv, Tao A <tao.a...@intel.com>; Jiang, 
Wenting <wenting.ji...@intel.com>; Zhao, Patric <patric.z...@intel.com>
Subject: Re: Intel Plan for the contribution to MXNET

Hi Patric,

Thanks for the contribution. It’s great to see actions on developing INT8 
inference for CPU! I have a few questions and hope to have your answers.

1.  When you said your work is aligned with 
PR9552<https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you 
used quantization+calibration flows developed in that PR for benchmarking 
inferences?

[Patric] The benchmark accuracy is based on MKLDNN and ziheng’s old 
quantization branch.

Now we have merged to master (based on #8302) with quantization+calibration PR 
for int8 development, will show you the accuracy and performance soon.

2.  In you MNIST benchmark, what operators are quantized?

[Patric] Conv, relu and flatten are quantized in our mnist benchmark 
(conv+relu+flatten+FC+softmax).

Besides, MKLDNN supports pooling, concat and fused(conv with relu/elem/bn) int8 
ops.

3.  Is the MNIST quantized model calibrated?

[Patric] Not yet, we did the experiment on ziheng’s old quantization branch, 
now we are moving to branch of quantization+calibration PR.

4.  Is the inference accuracy of INT8 produced by the calibrated quantized 
model, or just quantized model without calibration?

[Patric] Without calibration

5.  What are the throughputs of FP32 model and INT8 model for inference, 
respectively?

[Patric] In this stage, we are mainly focus on the accuracy and algorithm. The 
performance fine tune is on the way ☺

Thanks,
Jun

On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric 
<patric.z...@intel.com<mailto:patric.z...@intel.com>> wrote:
Hi MXNET developers,

We are from Intel Software and Service Group (SSG) and working on the 
performance optimization for MXNET on Intel Architecture (IA).

Let me do a simple introduction about our ongoing projects.

Any suggestions and comments are highly appreciated.

1)  MKL-DNN integration with new NNVM interface

We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.

The new implementation shows the better performance and flexibility than old 
MKL engine.

The PR is under review (https://github.com/apache/incubator-mxnet/pull/8302) 
and very thanks for your great comments in the thread :)

After the PR is merged, we will push more MKL-DNN related features and 
performance optimization strategies, such as fused conv + relu OP for the 
inference.

2)  INT8 inference

MKL-DNN also provides the int8 calculations such as for conv, relu,  pooling 
which can improve the inference performance a lot within very slight accuracy 
drop (like <1%).

Currently, we have implemented quantization, de-quantization, and some 
computing Ops in local branch.

Our latest implementation is aligned with this PR 
(https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit test.

For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we got 
very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).

We will update a summary of our solution in this PR soon.

I hope both CPU and GPU can be compatible and share the common code base 
together. So, I think we need more discussion in the PR :)

3)  RNN implementations

Currently, there is no CPU implementation for mx.sym.rnn and the python 
implementation is really slower.

We are working on resolving this issue from two aspects.:

-  Provide the C/C++ level implementation, registering by FCompute 
(GPU code should be moved to NNVM as well).

We plan to PR the LSTM/GRU in the March and our initial results as below, FYI
Size :N = 12, T = 1600, I = 161, H = 1760 (from the first layer of 
deep speech 2)
Forward

mx.sym.gru binded Intel GRU C(s)

Native mx.rnn.GRUCell(s)

SKX 6148, 2 socket

1.32

72.7

-  Provide the MKL-DNN RNN interface (under development, 
https://github.com/intel/mkl-dnn/issues/46), registering by FComputeEx

The higher performance RNN is under development by MKL-DNN team. And we will 
merge it when it's ready.

I think the CPU user can get further performance boost by MKL-DNN library.

 Thanks in advance!

 BR,

-- Patric

Intel Plan for the contribution to MXNET

2018-01-31 Thread Zhao, Patric

Hi MXNET developers,

We are from Intel Software and Service Group (SSG) and working on the 
performance optimization for MXNET on Intel Architecture (IA).

Let me do a simple introduction about our ongoing projects.

Any suggestions and comments are highly appreciated.


1)  MKL-DNN integration with new NNVM interface

We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.

The new implementation shows the better performance and flexibility than old 
MKL engine.



The PR is under review (https://github.com/apache/incubator-mxnet/pull/8302) 
and very thanks for your great comments in the thread :)

After the PR is merged, we will push more MKL-DNN related features and 
performance optimization strategies, such as fused conv + relu OP for the 
inference.



2)  INT8 inference

MKL-DNN also provides the int8 calculations such as for conv, relu,  pooling 
which can improve the inference performance a lot within very slight accuracy 
drop (like <1%).

Currently, we have implemented quantization, de-quantization, and some 
computing Ops in local branch.

Our latest implementation is aligned with this PR 
(https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit test.



For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we got 
very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).

We will update a summary of our solution in this PR soon.



I hope both CPU and GPU can be compatible and share the common code base 
together. So, I think we need more discussion in the PR :)



3)  RNN implementations

Currently, there is no CPU implementation for mx.sym.rnn and the python 
implementation is really slower.

We are working on resolving this issue from two aspects.:

-  Provide the C/C++ level implementation, registering by FCompute 
(GPU code should be moved to NNVM as well).

We plan to PR the LSTM/GRU in the March and our initial results as below, FYI
Size :N = 12, T = 1600, I = 161, H = 1760 (from the first layer of 
deep speech 2)
Forward

mx.sym.gru binded Intel GRU C(s)

Native mx.rnn.GRUCell(s)

SKX 6148, 2 socket

1.32

72.7




-  Provide the MKL-DNN RNN interface (under development, 
https://github.com/intel/mkl-dnn/issues/46), registering by FComputeEx

The higher performance RNN is under development by MKL-DNN team. And we will 
merge it when it's ready.

I think the CPU user can get further performance boost by MKL-DNN library.

 Thanks in advance!

 BR,

-- Patric

1 2 >

100 matches

Mail list logo