Re: Include MKLDNN into default mxnet pip package

2018-12-11 Thread Naveen Swamy
Great effort Alex and also folks from Intel.

+1 to make MKLDNN default.

On Tue, Dec 11, 2018 at 9:10 AM Kumar, Vikas 
wrote:

> +1
>
> On 12/10/18, 8:01 PM, "Zhao, Patric"  wrote:
>
> +1, thanks for the efforts, Alex.
>
>
>
> > -Original Message-
> > From: Alex Zai [mailto:aza...@gmail.com]
> > Sent: Tuesday, December 11, 2018 8:00 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Include MKLDNN into default mxnet pip package
> >
> > Continuation from the following thread:
> >
> https://lists.apache.org/thread.html/bcb1bd5046ff51049a0556098e756578f
> > 6fa6564831d77fddb56432f@%3Cdev.mxnet.apache.org%3E
> >
> > I am also +1 for making it on master and testing until 1.5.0. We can
> decide
> > later on (before 1.5.0) to enable mkldnn as default for the nightly
> build (pip
> > install --pre build) to try to get more feedback if needed.
> >
> > - What the story is like when there's no AVX instructions present on
> CPUs.
> > Do we get an illegal instruction error, or does it fallback
> gracefully?
> > According to this issue (
> > https://github.com/apache/incubator-mxnet/issues/11911), AVX2 is the
> > minimum requirement for pre-build binaries.
> >
> > - Are there any outstanding issues when MKLDNN is enabled?
> > -There is one issues with quantization int8 of mkldnn (will create
> issue about
> > it when team gives me reproducible code snippet). Additionally, we
> are
> > waiting to merge the PR to build mkldnn statically with mac/linux
> when
> > building from source after MKL is added to the CI.
> >
> >
> > - MKLDNN is a submodule dependency, are we pulling the latest commit
> or
> > releases? If not we should move to releases before we make it a
> default I
> > agree. We should tag mxnet only to releases from now on. Currently
> it is
> > tagged to 0.17.1
> >
> > Please let me know if there any other outstanding issues, else we
> are going
> > to make mkldnn / cmake default in the Make/CMakefile.
> >
> > Alex
>
>
>


Re: Include MKLDNN into default mxnet pip package

2018-12-11 Thread Kumar, Vikas
+1

On 12/10/18, 8:01 PM, "Zhao, Patric"  wrote:

+1, thanks for the efforts, Alex.



> -Original Message-
> From: Alex Zai [mailto:aza...@gmail.com]
> Sent: Tuesday, December 11, 2018 8:00 AM
> To: dev@mxnet.incubator.apache.org
    > Subject: Include MKLDNN into default mxnet pip package
> 
> Continuation from the following thread:
> https://lists.apache.org/thread.html/bcb1bd5046ff51049a0556098e756578f
> 6fa6564831d77fddb56432f@%3Cdev.mxnet.apache.org%3E
> 
> I am also +1 for making it on master and testing until 1.5.0. We can 
decide
> later on (before 1.5.0) to enable mkldnn as default for the nightly build 
(pip
> install --pre build) to try to get more feedback if needed.
> 
> - What the story is like when there's no AVX instructions present on CPUs.
> Do we get an illegal instruction error, or does it fallback gracefully?
> According to this issue (
> https://github.com/apache/incubator-mxnet/issues/11911), AVX2 is the
> minimum requirement for pre-build binaries.
> 
> - Are there any outstanding issues when MKLDNN is enabled?
> -There is one issues with quantization int8 of mkldnn (will create issue 
about
> it when team gives me reproducible code snippet). Additionally, we are
> waiting to merge the PR to build mkldnn statically with mac/linux when
> building from source after MKL is added to the CI.
> 
> 
> - MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases? If not we should move to releases before we make it a default I
> agree. We should tag mxnet only to releases from now on. Currently it is
> tagged to 0.17.1
> 
> Please let me know if there any other outstanding issues, else we are 
going
> to make mkldnn / cmake default in the Make/CMakefile.
> 
> Alex




RE: Include MKLDNN into default mxnet pip package

2018-12-10 Thread Zhao, Patric
+1, thanks for the efforts, Alex.



> -Original Message-
> From: Alex Zai [mailto:aza...@gmail.com]
> Sent: Tuesday, December 11, 2018 8:00 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Include MKLDNN into default mxnet pip package
> 
> Continuation from the following thread:
> https://lists.apache.org/thread.html/bcb1bd5046ff51049a0556098e756578f
> 6fa6564831d77fddb56432f@%3Cdev.mxnet.apache.org%3E
> 
> I am also +1 for making it on master and testing until 1.5.0. We can decide
> later on (before 1.5.0) to enable mkldnn as default for the nightly build (pip
> install --pre build) to try to get more feedback if needed.
> 
> - What the story is like when there's no AVX instructions present on CPUs.
> Do we get an illegal instruction error, or does it fallback gracefully?
> According to this issue (
> https://github.com/apache/incubator-mxnet/issues/11911), AVX2 is the
> minimum requirement for pre-build binaries.
> 
> - Are there any outstanding issues when MKLDNN is enabled?
> -There is one issues with quantization int8 of mkldnn (will create issue about
> it when team gives me reproducible code snippet). Additionally, we are
> waiting to merge the PR to build mkldnn statically with mac/linux when
> building from source after MKL is added to the CI.
> 
> 
> - MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases? If not we should move to releases before we make it a default I
> agree. We should tag mxnet only to releases from now on. Currently it is
> tagged to 0.17.1
> 
> Please let me know if there any other outstanding issues, else we are going
> to make mkldnn / cmake default in the Make/CMakefile.
> 
> Alex


Include MKLDNN into default mxnet pip package

2018-12-10 Thread Alex Zai
Continuation from the following thread:
https://lists.apache.org/thread.html/bcb1bd5046ff51049a0556098e756578f6fa6564831d77fddb56432f@%3Cdev.mxnet.apache.org%3E

I am also +1 for making it on master and testing until 1.5.0. We can decide
later on (before 1.5.0) to enable mkldnn as default for the nightly build
(pip install --pre build) to try to get more feedback if needed.

- What the story is like when there's no AVX instructions present on CPUs.
Do we get an illegal instruction error, or does it fallback gracefully?
According to this issue (
https://github.com/apache/incubator-mxnet/issues/11911), AVX2 is the
minimum requirement for pre-build binaries.

- Are there any outstanding issues when MKLDNN is enabled?
-There is one issues with quantization int8 of mkldnn (will create issue
about it when team gives me reproducible code snippet). Additionally, we
are waiting to merge the PR to build mkldnn statically with mac/linux when
building from source after MKL is added to the CI.


- MKLDNN is a submodule dependency, are we pulling the latest commit or
releases? If not we should move to releases before we make it a default
I agree. We should tag mxnet only to releases from now on. Currently it is
tagged to 0.17.1

Please let me know if there any other outstanding issues, else we are going
to make mkldnn / cmake default in the Make/CMakefile.

Alex


Re: LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

2018-11-28 Thread Zai, Alexander
Ran benchmark and it addresses issue. Thanks.

On 11/28/18, 6:02 PM, "Zhao, Patric"  wrote:

MKL-DNN v0.17.1 is released https://github.com/intel/mkl-dnn/tree/v0.17.1

I have submitted the PR to pin this release version.

Thanks,

--Patric

> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Wednesday, November 28, 2018 8:07 PM
> To: dev@mxnet.incubator.apache.org
> Subject: LSTM regression (was RE: Include MKLDNN into default mxnet pip
> package)
> 
> Hi Anirudh,
> 
> The LSTM performance bug is fixed by MKL-DNN and PR  in here
> (https://github.com/apache/incubator-mxnet/pull/13417).
> 
> I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in
> 1 or 2 days.
> 
> Will update the status soon.
> 
> Thanks everyone.
> 
> --Patric
> 
> > -Original Message-
> > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > Sent: Tuesday, November 27, 2018 6:16 AM
    > > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Tao,
> >
> > I agree with Steffen that we can start with a stable release for
> > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide
> > info on what versioning mechanism MKLDNN uses. Once a MKLDNN
> release
> > is out and there are some regressions found like the LSTM regression,
> > would it be possible to do a patch release for it or maintain a release
> branch for it ?
> >
> > Anirudh
> >
> > On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
> >
> > > Hi Steffen,
> > >
> > > I think all the commits on MKL-DNN master branch are well tested for
> > > MKL-DNN development team. If we really want to have a release commit
> > > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN 
release.
> > >
> > > Thank you,
> > > Tao
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > > 
> > > wrote:
> > > >
> > > > +1 to make MKL-DNN default.
> > > > I'm tracking
> > > > https://github.com/apache/incubator-mxnet/issues/13369
> > > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > > move to a model to include released
> > > dependencies
> > > > instead of just taking bleeding edge snapshots.
> > > > However, speed of development is important as well.
> > > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > > development
> > > > team provide us with a well tested tag/commit id to include in
> > > > 1.4.0 release?
> > > > Steffen
> > > >
> > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> > wrote:
> > > >>
> > > >> Thanks for the information, Kellen and Naveen.
> > > >>
> > > >> Better than onnx-tensorrt, MKL-DNN has already provided
> > > >> versioning and release tags. My concern is that as MKL-DNN is
> > > >> still under intensive development, if it has a new feature or bug
> > > >> fix on its master branch,
> > > do we
> > > >> really want to wait for next release to get it supported in MXNet?
> > > >>
> > > >> Take the LSTM regression as an example, probably MKL-DNN will
> > > >> give a fix or improvement on its master branch soon, do we need
> > > >> to wait for 0.18 release to get it fixed for mxnet user? AFAIK,
    > > > >> tensorflow is also using normal commit id, not release, as the
> > > >> dependency for MKL-
> > DNN.
> > > >>
> > > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > > >> rather than github issues to track the defects of MKL-DNN. But I
> > > >> agree with
> > > you,
> > > >> we need update the progress of it in Alex's issue.
> > > >>
> > > >> Thanks,
> > > >> -tao
> > > >>
> > > >> -Original Message-
&g

RE: LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

2018-11-28 Thread Zhao, Patric
MKL-DNN v0.17.1 is released https://github.com/intel/mkl-dnn/tree/v0.17.1

I have submitted the PR to pin this release version.

Thanks,

--Patric

> -Original Message-
> From: Zhao, Patric [mailto:patric.z...@intel.com]
> Sent: Wednesday, November 28, 2018 8:07 PM
> To: dev@mxnet.incubator.apache.org
> Subject: LSTM regression (was RE: Include MKLDNN into default mxnet pip
> package)
> 
> Hi Anirudh,
> 
> The LSTM performance bug is fixed by MKL-DNN and PR  in here
> (https://github.com/apache/incubator-mxnet/pull/13417).
> 
> I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in
> 1 or 2 days.
> 
> Will update the status soon.
> 
> Thanks everyone.
> 
> --Patric
> 
> > -Original Message-
> > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > Sent: Tuesday, November 27, 2018 6:16 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Tao,
> >
> > I agree with Steffen that we can start with a stable release for
> > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide
> > info on what versioning mechanism MKLDNN uses. Once a MKLDNN
> release
> > is out and there are some regressions found like the LSTM regression,
> > would it be possible to do a patch release for it or maintain a release
> branch for it ?
> >
> > Anirudh
> >
> > On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
> >
> > > Hi Steffen,
> > >
> > > I think all the commits on MKL-DNN master branch are well tested for
> > > MKL-DNN development team. If we really want to have a release commit
> > > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> > >
> > > Thank you,
> > > Tao
> > >
> > > Sent from my iPhone
> > >
> > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > > 
> > > wrote:
> > > >
> > > > +1 to make MKL-DNN default.
> > > > I'm tracking
> > > > https://github.com/apache/incubator-mxnet/issues/13369
> > > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > > move to a model to include released
> > > dependencies
> > > > instead of just taking bleeding edge snapshots.
> > > > However, speed of development is important as well.
> > > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > > development
> > > > team provide us with a well tested tag/commit id to include in
> > > > 1.4.0 release?
> > > > Steffen
> > > >
> > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> > wrote:
> > > >>
> > > >> Thanks for the information, Kellen and Naveen.
> > > >>
> > > >> Better than onnx-tensorrt, MKL-DNN has already provided
> > > >> versioning and release tags. My concern is that as MKL-DNN is
> > > >> still under intensive development, if it has a new feature or bug
> > > >> fix on its master branch,
> > > do we
> > > >> really want to wait for next release to get it supported in MXNet?
> > > >>
> > > >> Take the LSTM regression as an example, probably MKL-DNN will
> > > >> give a fix or improvement on its master branch soon, do we need
> > > >> to wait for 0.18 release to get it fixed for mxnet user? AFAIK,
> > > >> tensorflow is also using normal commit id, not release, as the
> > > >> dependency for MKL-
> > DNN.
> > > >>
> > > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > > >> rather than github issues to track the defects of MKL-DNN. But I
> > > >> agree with
> > > you,
> > > >> we need update the progress of it in Alex's issue.
> > > >>
> > > >> Thanks,
> > > >> -tao
> > > >>
> > > >> -Original Message-
> > > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > > >> Sent: Thursday, November 22, 2018 10:55 AM
> > > >> To: dev@mxnet.incubator.apache.org
> > > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > > >>
> > > >> Agree with your point about other repos also not being based on
> > > versioning
> > > >> Tao.  I would point out that I've given some that I've worked
> > > &

RE: Include MKLDNN into default mxnet pip package

2018-11-28 Thread Zhao, Patric
+1 for making MKL-DNN default in master branch first for broad testing :)

 My suggestion is to make MKL-DNN default on the master branch
 firstly after 1.4.0 releasing branch is cut off. That will help MKL-DNN backend
 to be widely used and tested by MXNet users who are building MXNet from
 source. It will also help to expose issues of MKL-DNN backend in the next
 releasing cycle. We can decide whether to make it default in pip package for
 1.5.0 release according to the feedback from the community. For 1.4.0
 release, we can still have MKL-DNN in the mxnet-mkl package.

> -Original Message-
> From: Zai, Alexander [mailto:alex...@amazon.com.INVALID]
> Sent: Thursday, November 29, 2018 4:06 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> Thanks for answering Tao. I would like to add that we have the env flag that
> disables MKLDNN operators if regression occurs.
> 
> On 11/28/18, 6:05 AM, "Lv, Tao A"  wrote:
> 
> Hi Hagay, thank you for bringing these questions together. I also
> summarized my opinions here for you easy to check.
> 
> - Make MKL-DNN default in MXNet pip package
> [Tao]: My suggestion is to make MKL-DNN default on the master branch
> firstly after 1.4.0 releasing branch is cut off. That will help MKL-DNN 
> backend
> to be widely used and tested by MXNet users who are building MXNet from
> source. It will also help to expose issues of MKL-DNN backend in the next
> releasing cycle. We can decide whether to make it default in pip package for
> 1.5.0 release according to the feedback from the community. For 1.4.0
> release, we can still have MKL-DNN in the mxnet-mkl package.
> 
> - What the story is like when there's no AVX instructions present on CPUs.
> Do we get an illegal instruction error, or does it fallback gracefully?
> [Tao]: MKL-DNN has optimizations for every ISA starting with SSE4.2 and
> there is a list for those platforms which are officially supported by MKL-DNN:
> https://github.com/intel/mkl-dnn#system-requirements. It should fallback if
> AVX is not supported. Most of computation intensive kernels in MKL-DNN are
> JITed. So they are supposed to generate code according to the platform
> during runtime and should not have any illegal instruction. For non-JIT code
> in MKL-DNN, same as other code in MXNet, it will generate instructions
> according to the options/flags of compiler. We can set -DARCH_OPT_FLAGS
> when build MKL-DNN to avoid optimization for compiling machine. That's
> exactly what we are doing for MKL-DNN build in MXNet. Even without MKL-
> DNN, I noticed there were issues about illegal instructions of MXNet when
> users import the pip package on a lower end machine which probably only
> supports SSE.
> 
> - Are there any outstanding issues when MKLDNN is enabled?
> [Tao]: I don’t know any at this time except the LSTM regression which
> hopefully will be fixed soon. I notice the fix has been pushed to MKL-DNN
> master branch. But if we decide to depend on release version only, we need
> wait for the release process of MKL-DNN finishing. If anyone knows other
> issues about MKL-DNN backend, feel free to let me know. :)
> 
> - MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases? If not we should move to releases before we make it a default
> [Tao]: I don't have strong resistance to release version. But if you want 
> to
> make a rule for MXNet that a submodule should depend on a release version,
> please take all the submodules into consideration. For MKL-DNN, my
> concern is: If the master (development) branch of MXNet relies on a bleeding
> edge commit from MKL-DNN master branch, when MXNet comes to release,
> we need revert many changes in MXNet if MKL-DNN will not have a new
> release at that time, since we need fallback the dependency to a previous
> release version. That might mess up or slow down the development and
> release of MXNet. To avoid that, we always need negotiate with MKL-DNN
> team for the release pace before every release. Please propose a solution
> for this situation and make a plan how to apply it to all submodules.
> 
> - MKLDNN versioning mechanism
> [Tao]: Copied MKL-DNN manager’s words here:
> "That's valid request and I would expect that as the software matures
> more and more applications will rely on stable versions. I would expect that
> for MXNet there is a stable branch that would rely on stable MKL-DNN and
> development branch that would rely on master.
> MKL-DNN relies on semantic versioning. We do maintain a release
> branches in addition to master that can be used to release patches. In
> particular we are planning v0.17.1 this week to de

Re: Include MKLDNN into default mxnet pip package

2018-11-28 Thread Zai, Alexander
Thanks for answering Tao. I would like to add that we have the env flag that 
disables MKLDNN operators if regression occurs.

On 11/28/18, 6:05 AM, "Lv, Tao A"  wrote:

Hi Hagay, thank you for bringing these questions together. I also 
summarized my opinions here for you easy to check.

- Make MKL-DNN default in MXNet pip package
[Tao]: My suggestion is to make MKL-DNN default on the master branch 
firstly after 1.4.0 releasing branch is cut off. That will help MKL-DNN backend 
to be widely used and tested by MXNet users who are building MXNet from source. 
It will also help to expose issues of MKL-DNN backend in the next releasing 
cycle. We can decide whether to make it default in pip package for 1.5.0 
release according to the feedback from the community. For 1.4.0 release, we can 
still have MKL-DNN in the mxnet-mkl package.

- What the story is like when there's no AVX instructions present on CPUs. 
Do we get an illegal instruction error, or does it fallback gracefully?
[Tao]: MKL-DNN has optimizations for every ISA starting with SSE4.2 and 
there is a list for those platforms which are officially supported by MKL-DNN: 
https://github.com/intel/mkl-dnn#system-requirements. It should fallback if AVX 
is not supported. Most of computation intensive kernels in MKL-DNN are JITed. 
So they are supposed to generate code according to the platform during runtime 
and should not have any illegal instruction. For non-JIT code in MKL-DNN, same 
as other code in MXNet, it will generate instructions according to the 
options/flags of compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to 
avoid optimization for compiling machine. That's exactly what we are doing for 
MKL-DNN build in MXNet. Even without MKL-DNN, I noticed there were issues about 
illegal instructions of MXNet when users import the pip package on a lower end 
machine which probably only supports SSE.

- Are there any outstanding issues when MKLDNN is enabled?
[Tao]: I don’t know any at this time except the LSTM regression which 
hopefully will be fixed soon. I notice the fix has been pushed to MKL-DNN 
master branch. But if we decide to depend on release version only, we need wait 
for the release process of MKL-DNN finishing. If anyone knows other issues 
about MKL-DNN backend, feel free to let me know. :)

- MKLDNN is a submodule dependency, are we pulling the latest commit or 
releases? If not we should move to releases before we make it a default
[Tao]: I don't have strong resistance to release version. But if you want 
to make a rule for MXNet that a submodule should depend on a release version, 
please take all the submodules into consideration. For MKL-DNN, my concern is: 
If the master (development) branch of MXNet relies on a bleeding edge commit 
from MKL-DNN master branch, when MXNet comes to release, we need revert many 
changes in MXNet if MKL-DNN will not have a new release at that time, since we 
need fallback the dependency to a previous release version. That might mess up 
or slow down the development and release of MXNet. To avoid that, we always 
need negotiate with MKL-DNN team for the release pace before every release. 
Please propose a solution for this situation and make a plan how to apply it to 
all submodules.

- MKLDNN versioning mechanism
[Tao]: Copied MKL-DNN manager’s words here:
"That's valid request and I would expect that as the software matures more 
and more applications will rely on stable versions. I would expect that for 
MXNet there is a stable branch that would rely on stable MKL-DNN and 
development branch that would rely on master. 
MKL-DNN relies on semantic versioning. We do maintain a release branches in 
addition to master that can be used to release patches. In particular we are 
planning v0.17.1 this week to deliver a fix for reorders that you requested. 
This works in the following way:
* master contains the latest development (typically the next release)
* rls-v0.17 contains v0.17 and will be used to create minor releases 
(v0.17.1 and so on)"
I’m happy to see that MKL-DNN will have a patch release for the LSTM 
regression issue.

-tao


-Original Message-
From: Hagay Lupesko [mailto:lupe...@gmail.com] 
Sent: Wednesday, November 28, 2018 4:22 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hey all,

I'm also supportive of making MKLDNN the default build for MXNet, but there 
were a few questions asked in the thread that I am not sure were answered.
Would be great if Alex and others who worked on MKLDNN and that are 
proposing it to be the default can answer them clearly:
- What the story is like when there's no AVX instructions present on CPUs.
Do we get an illegal instruction error, or does it fallback gracefully?
(asked by Kellen)
   

RE: Include MKLDNN into default mxnet pip package

2018-11-28 Thread Lv, Tao A
Hi Hagay, thank you for bringing these questions together. I also summarized my 
opinions here for you easy to check.

- Make MKL-DNN default in MXNet pip package
[Tao]: My suggestion is to make MKL-DNN default on the master branch firstly 
after 1.4.0 releasing branch is cut off. That will help MKL-DNN backend to be 
widely used and tested by MXNet users who are building MXNet from source. It 
will also help to expose issues of MKL-DNN backend in the next releasing cycle. 
We can decide whether to make it default in pip package for 1.5.0 release 
according to the feedback from the community. For 1.4.0 release, we can still 
have MKL-DNN in the mxnet-mkl package.

- What the story is like when there's no AVX instructions present on CPUs. Do 
we get an illegal instruction error, or does it fallback gracefully?
[Tao]: MKL-DNN has optimizations for every ISA starting with SSE4.2 and there 
is a list for those platforms which are officially supported by MKL-DNN: 
https://github.com/intel/mkl-dnn#system-requirements. It should fallback if AVX 
is not supported. Most of computation intensive kernels in MKL-DNN are JITed. 
So they are supposed to generate code according to the platform during runtime 
and should not have any illegal instruction. For non-JIT code in MKL-DNN, same 
as other code in MXNet, it will generate instructions according to the 
options/flags of compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to 
avoid optimization for compiling machine. That's exactly what we are doing for 
MKL-DNN build in MXNet. Even without MKL-DNN, I noticed there were issues about 
illegal instructions of MXNet when users import the pip package on a lower end 
machine which probably only supports SSE.

- Are there any outstanding issues when MKLDNN is enabled?
[Tao]: I don’t know any at this time except the LSTM regression which hopefully 
will be fixed soon. I notice the fix has been pushed to MKL-DNN master branch. 
But if we decide to depend on release version only, we need wait for the 
release process of MKL-DNN finishing. If anyone knows other issues about 
MKL-DNN backend, feel free to let me know. :)

- MKLDNN is a submodule dependency, are we pulling the latest commit or 
releases? If not we should move to releases before we make it a default
[Tao]: I don't have strong resistance to release version. But if you want to 
make a rule for MXNet that a submodule should depend on a release version, 
please take all the submodules into consideration. For MKL-DNN, my concern is: 
If the master (development) branch of MXNet relies on a bleeding edge commit 
from MKL-DNN master branch, when MXNet comes to release, we need revert many 
changes in MXNet if MKL-DNN will not have a new release at that time, since we 
need fallback the dependency to a previous release version. That might mess up 
or slow down the development and release of MXNet. To avoid that, we always 
need negotiate with MKL-DNN team for the release pace before every release. 
Please propose a solution for this situation and make a plan how to apply it to 
all submodules.

- MKLDNN versioning mechanism
[Tao]: Copied MKL-DNN manager’s words here:
"That's valid request and I would expect that as the software matures more and 
more applications will rely on stable versions. I would expect that for MXNet 
there is a stable branch that would rely on stable MKL-DNN and development 
branch that would rely on master. 
MKL-DNN relies on semantic versioning. We do maintain a release branches in 
addition to master that can be used to release patches. In particular we are 
planning v0.17.1 this week to deliver a fix for reorders that you requested. 
This works in the following way:
* master contains the latest development (typically the next release)
* rls-v0.17 contains v0.17 and will be used to create minor releases (v0.17.1 
and so on)"
I’m happy to see that MKL-DNN will have a patch release for the LSTM regression 
issue.

-tao


-Original Message-
From: Hagay Lupesko [mailto:lupe...@gmail.com] 
Sent: Wednesday, November 28, 2018 4:22 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hey all,

I'm also supportive of making MKLDNN the default build for MXNet, but there 
were a few questions asked in the thread that I am not sure were answered.
Would be great if Alex and others who worked on MKLDNN and that are proposing 
it to be the default can answer them clearly:
- What the story is like when there's no AVX instructions present on CPUs.
Do we get an illegal instruction error, or does it fallback gracefully?
(asked by Kellen)
- Are there any outstanding issues when MKLDNN is enabled? (asked by Naveen)
- MKLDNN is a submodule dependency, are we pulling the latest commit or 
releases? If not we should move to releases before we make it a default
(Naveen)
  There was a discussion about MKLDNN version used by MXNet, and would be great 
if it can be summarized

LSTM regression (was RE: Include MKLDNN into default mxnet pip package)

2018-11-28 Thread Zhao, Patric
Hi Anirudh,

The LSTM performance bug is fixed by MKL-DNN and PR  in here 
(https://github.com/apache/incubator-mxnet/pull/13417).

I am still working on MKL-DNN team to get a patch release for MXNet 1.4 in 1 or 
2 days.

Will update the status soon.

Thanks everyone.

--Patric

> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> Hi Tao,
> 
> I agree with Steffen that we can start with a stable release for MKLDNN for
> 1.4.0. For your suggestion on using 0.17, can you provide info on what
> versioning mechanism MKLDNN uses. Once a MKLDNN release is out and
> there are some regressions found like the LSTM regression, would it be
> possible to do a patch release for it or maintain a release branch for it ?
> 
> Anirudh
> 
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
> 
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for
> > MKL-DNN development team. If we really want to have a release commit
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > 
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 1.4.0
> > > release?
> > > Steffen
> > >
> > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> wrote:
> > >>
> > >> Thanks for the information, Kellen and Naveen.
> > >>
> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning
> > >> and release tags. My concern is that as MKL-DNN is still under
> > >> intensive development, if it has a new feature or bug fix on its
> > >> master branch,
> > do we
> > >> really want to wait for next release to get it supported in MXNet?
> > >>
> > >> Take the LSTM regression as an example, probably MKL-DNN will give
> > >> a fix or improvement on its master branch soon, do we need to wait
> > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow
> > >> is also using normal commit id, not release, as the dependency for MKL-
> DNN.
> > >>
> > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > >> rather than github issues to track the defects of MKL-DNN. But I
> > >> agree with
> > you,
> > >> we need update the progress of it in Alex's issue.
> > >>
> > >> Thanks,
> > >> -tao
> > >>
> > >> -Original Message-
> > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > >> Sent: Thursday, November 22, 2018 10:55 AM
> > >> To: dev@mxnet.incubator.apache.org
> > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>
> > >> Agree with your point about other repos also not being based on
> > versioning
> > >> Tao.  I would point out that I've given some that I've worked with
> > similar
> > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> > >>
> > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy
> 
> > wrote:
> > >>>
> > >>> Tao,
> > >>>
> > >>> You are right there are many submodules in 3rd party. We have to
> > >>> start somewhere and I believe this one is a good candidate to start
> with.
> > >>> This is not to cater to release of MXNet or to tie them with the
> > >>> releases of the submodules but instead to pick only stable
> > >>> releases and not to pick up bleeding edge commits from the tip of
> > >>> the master, this gives us confidence in the submodule that MXNet
> > >>> users are depending on that especially if we make MKLDNN the default.
> > >>>
> > >>> 

Re: Include MKLDNN into default mxnet pip package

2018-11-28 Thread Hagay Lupesko
Hey all,

I'm also supportive of making MKLDNN the default build for MXNet, but there
were a few questions asked in the thread that I am not sure were answered.
Would be great if Alex and others who worked on MKLDNN and that are
proposing it to be the default can answer them clearly:
- What the story is like when there's no AVX instructions present on CPUs.
Do we get an illegal instruction error, or does it fallback gracefully?
(asked by Kellen)
- Are there any outstanding issues when MKLDNN is enabled? (asked by Naveen)
- MKLDNN is a submodule dependency, are we pulling the latest commit or
releases? If not we should move to releases before we make it a default
(Naveen)
  There was a discussion about MKLDNN version used by MXNet, and would be
great if it can be summarized.

Hagay



On Tue, Nov 27, 2018 at 6:21 PM Lv, Tao A  wrote:

> Hi Anirudh, please find the statements from MKL-DNN manager for the
> versioning mechanism of MKL-DNN library as below:
>
> "That's valid request and I would expect that as the software matures more
> and more applications will rely on stable versions. I would expect that for
> MXNet there is a stable branch that would rely on stable MKL-DNN and
> development branch that would rely on master.
>
> MKL-DNN relies on semantic versioning. We do maintain a release branches
> in addition to master that can be used to release patches. In particular we
> are planning v0.17.1 this week to deliver a fix for reorders that you
> requested. This works in the following way:
> * master contains the latest development (typically the next release)
> * rls-v0.17 contains v0.17 and will be used to create minor releases
> (v0.17.1 and so on)"
>
> I also restate my initial concern here: If the master (development) branch
> of MXNet relies on a bleeding edge commit from MKL-DNN master branch, when
> MXNet comes to release, we need revert many changes in MXNet if MKL-DNN
> will not have a new release at that time, since we need fallback the
> dependency to a previous release version. That might mess up or slow down
> the development and release of MXNet. To avoid that, we always need
> negotiate with MKL-DNN team for the release pace before every release.
>
> If you have any other questions about MKL-DNN's versioning, feel free to
> let me know. If you want to change and re-define the dependency behavior of
> MKL-DNN, please propose a solution for my concern and start a vote for that.
>
> Thanks,
> -tao
>
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Wednesday, November 28, 2018 8:26 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Tao,
>
> I was suggesting we can start using a release tag from mkldnn for major
> and minor releases of mxnet starting with 1.4.0. But this would require a
> versioning mechanism similar to semver for MKLDNN and  MKLDNN to do patch
> release to backport the bug fixes/regressions. I dont know if this is going
> to happen anytime soon (It would be nice if you can obtain some timeline
> from MKLDNN team on this). As long as the PIP still has two different
> packages for mkl and without mkl my vote is +1 for adding it as a default.
>
> Anirudh
>
>
> On Tue, Nov 27, 2018 at 5:04 AM Lv, Tao A  wrote:
>
> > Hi Anirudh,
> >
> > Just to confirm, you're focusing on the 1.4.0 release of MXNet and
> > want to have a release version of MKL-DNN there, right? Or do you mean
> > all the development in the future should base on the release version of
> MKL-DNN?
> > For the former one, I think 0.17 release of MKL-DNN is a good choice.
> > But it will not have fix for the LSTM regression mentioned in previous
> email.
> >
> > I'm talking about the versioning mechanism with MKL-DNN maintainers
> > and will be back to you if I get any response. But from the releasing
> > history of MKL-DNN, I cannot find any evidence about patch release.
> >
> > -tao
> >
> > -Original Message-
> > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> > Sent: Tuesday, November 27, 2018 6:16 AM
> > To: dev@mxnet.incubator.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Tao,
> >
> > I agree with Steffen that we can start with a stable release for
> > MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide
> > info on what versioning mechanism MKLDNN uses. Once a MKLDNN release
> > is out and there are some regressions found like the LSTM regression,
> > would it be possible to do a patch release for it or maintain a release
> branch for it ?
> >
> > Anir

RE: Include MKLDNN into default mxnet pip package

2018-11-27 Thread Lv, Tao A
Hi Anirudh, please find the statements from MKL-DNN manager for the versioning 
mechanism of MKL-DNN library as below:

"That's valid request and I would expect that as the software matures more and 
more applications will rely on stable versions. I would expect that for MXNet 
there is a stable branch that would rely on stable MKL-DNN and development 
branch that would rely on master. 

MKL-DNN relies on semantic versioning. We do maintain a release branches in 
addition to master that can be used to release patches. In particular we are 
planning v0.17.1 this week to deliver a fix for reorders that you requested. 
This works in the following way:
* master contains the latest development (typically the next release)
* rls-v0.17 contains v0.17 and will be used to create minor releases (v0.17.1 
and so on)"

I also restate my initial concern here: If the master (development) branch of 
MXNet relies on a bleeding edge commit from MKL-DNN master branch, when MXNet 
comes to release, we need revert many changes in MXNet if MKL-DNN will not have 
a new release at that time, since we need fallback the dependency to a previous 
release version. That might mess up or slow down the development and release of 
MXNet. To avoid that, we always need negotiate with MKL-DNN team for the 
release pace before every release.

If you have any other questions about MKL-DNN's versioning, feel free to let me 
know. If you want to change and re-define the dependency behavior of MKL-DNN, 
please propose a solution for my concern and start a vote for that.

Thanks,
-tao


-Original Message-
From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] 
Sent: Wednesday, November 28, 2018 8:26 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hi Tao,

I was suggesting we can start using a release tag from mkldnn for major and 
minor releases of mxnet starting with 1.4.0. But this would require a 
versioning mechanism similar to semver for MKLDNN and  MKLDNN to do patch 
release to backport the bug fixes/regressions. I dont know if this is going to 
happen anytime soon (It would be nice if you can obtain some timeline from 
MKLDNN team on this). As long as the PIP still has two different packages for 
mkl and without mkl my vote is +1 for adding it as a default.

Anirudh


On Tue, Nov 27, 2018 at 5:04 AM Lv, Tao A  wrote:

> Hi Anirudh,
>
> Just to confirm, you're focusing on the 1.4.0 release of MXNet and 
> want to have a release version of MKL-DNN there, right? Or do you mean 
> all the development in the future should base on the release version of 
> MKL-DNN?
> For the former one, I think 0.17 release of MKL-DNN is a good choice. 
> But it will not have fix for the LSTM regression mentioned in previous email.
>
> I'm talking about the versioning mechanism with MKL-DNN maintainers 
> and will be back to you if I get any response. But from the releasing 
> history of MKL-DNN, I cannot find any evidence about patch release.
>
> -tao
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Tao,
>
> I agree with Steffen that we can start with a stable release for 
> MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide 
> info on what versioning mechanism MKLDNN uses. Once a MKLDNN release 
> is out and there are some regressions found like the LSTM regression, 
> would it be possible to do a patch release for it or maintain a release 
> branch for it ?
>
> Anirudh
>
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
>
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for 
> > MKL-DNN development team. If we really want to have a release commit 
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel 
> > > 
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  
> > > https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should 
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 
> > > 1.4.0 release?
> > > Steffen
> > >
>

Re: Include MKLDNN into default mxnet pip package

2018-11-27 Thread Anirudh Subramanian
Hi Tao,

I was suggesting we can start using a release tag from mkldnn for major and
minor releases of mxnet starting with 1.4.0. But this would require a
versioning mechanism similar to semver for MKLDNN and  MKLDNN to do patch
release to backport the bug fixes/regressions. I dont know if this is going
to happen anytime soon (It would be nice if you can obtain some timeline
from MKLDNN team on this). As long as the PIP still has two different
packages for mkl and without mkl my vote is +1 for adding it as a default.

Anirudh


On Tue, Nov 27, 2018 at 5:04 AM Lv, Tao A  wrote:

> Hi Anirudh,
>
> Just to confirm, you're focusing on the 1.4.0 release of MXNet and want to
> have a release version of MKL-DNN there, right? Or do you mean all the
> development in the future should base on the release version of MKL-DNN?
> For the former one, I think 0.17 release of MKL-DNN is a good choice. But
> it will not have fix for the LSTM regression mentioned in previous email.
>
> I'm talking about the versioning mechanism with MKL-DNN maintainers and
> will be back to you if I get any response. But from the releasing history
> of MKL-DNN, I cannot find any evidence about patch release.
>
> -tao
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Tao,
>
> I agree with Steffen that we can start with a stable release for MKLDNN
> for 1.4.0. For your suggestion on using 0.17, can you provide info on what
> versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
> are some regressions found like the LSTM regression, would it be possible
> to do a patch release for it or maintain a release branch for it ?
>
> Anirudh
>
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:
>
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for
> > MKL-DNN development team. If we really want to have a release commit
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > 
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 1.4.0
> > > release?
> > > Steffen
> > >
> > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A 
> wrote:
> > >>
> > >> Thanks for the information, Kellen and Naveen.
> > >>
> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning
> > >> and release tags. My concern is that as MKL-DNN is still under
> > >> intensive development, if it has a new feature or bug fix on its
> > >> master branch,
> > do we
> > >> really want to wait for next release to get it supported in MXNet?
> > >>
> > >> Take the LSTM regression as an example, probably MKL-DNN will give
> > >> a fix or improvement on its master branch soon, do we need to wait
> > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow
> > >> is also using normal commit id, not release, as the dependency for
> MKL-DNN.
> > >>
> > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > >> rather than github issues to track the defects of MKL-DNN. But I
> > >> agree with
> > you,
> > >> we need update the progress of it in Alex's issue.
> > >>
> > >> Thanks,
> > >> -tao
> > >>
> > >> -Original Message-
> > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > >> Sent: Thursday, November 22, 2018 10:55 AM
> > >> To: dev@mxnet.incubator.apache.org
> > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>
> > >> Agree with your point about other repos also not being based on
> > versioning
> > >> Tao.  I would point out that I've given some that I've 

RE: Include MKLDNN into default mxnet pip package

2018-11-27 Thread Lv, Tao A
Hi Anirudh,

Just to confirm, you're focusing on the 1.4.0 release of MXNet and want to have 
a release version of MKL-DNN there, right? Or do you mean all the development 
in the future should base on the release version of MKL-DNN? For the former 
one, I think 0.17 release of MKL-DNN is a good choice. But it will not have fix 
for the LSTM regression mentioned in previous email.

I'm talking about the versioning mechanism with MKL-DNN maintainers and will be 
back to you if I get any response. But from the releasing history of MKL-DNN, I 
cannot find any evidence about patch release.

-tao

-Original Message-
From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] 
Sent: Tuesday, November 27, 2018 6:16 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hi Tao,

I agree with Steffen that we can start with a stable release for MKLDNN for 
1.4.0. For your suggestion on using 0.17, can you provide info on what 
versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there are 
some regressions found like the LSTM regression, would it be possible to do a 
patch release for it or maintain a release branch for it ?

Anirudh

On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:

> Hi Steffen,
>
> I think all the commits on MKL-DNN master branch are well tested for 
> MKL-DNN development team. If we really want to have a release commit 
> in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
>
> Thank you,
> Tao
>
> Sent from my iPhone
>
> > On Nov 26, 2018, at 8:09 AM, Steffen Rochel 
> > 
> wrote:
> >
> > +1 to make MKL-DNN default.
> > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 
> > as open issue to be addressed for 1.4.0 I do agree that we should 
> > move to a model to include released
> dependencies
> > instead of just taking bleeding edge snapshots.
> > However, speed of development is important as well.
> > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> development
> > team provide us with a well tested tag/commit id to include in 1.4.0 
> > release?
> > Steffen
> >
> >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A  wrote:
> >>
> >> Thanks for the information, Kellen and Naveen.
> >>
> >> Better than onnx-tensorrt, MKL-DNN has already provided versioning 
> >> and release tags. My concern is that as MKL-DNN is still under 
> >> intensive development, if it has a new feature or bug fix on its 
> >> master branch,
> do we
> >> really want to wait for next release to get it supported in MXNet?
> >>
> >> Take the LSTM regression as an example, probably MKL-DNN will give 
> >> a fix or improvement on its master branch soon, do we need to wait 
> >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow 
> >> is also using normal commit id, not release, as the dependency for MKL-DNN.
> >>
> >> Regarding the LSTM regression, we are using internal JIRA tickets 
> >> rather than github issues to track the defects of MKL-DNN. But I 
> >> agree with
> you,
> >> we need update the progress of it in Alex's issue.
> >>
> >> Thanks,
> >> -tao
> >>
> >> -Original Message-
> >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> >> Sent: Thursday, November 22, 2018 10:55 AM
> >> To: dev@mxnet.incubator.apache.org
> >> Subject: Re: Include MKLDNN into default mxnet pip package
> >>
> >> Agree with your point about other repos also not being based on
> versioning
> >> Tao.  I would point out that I've given some that I've worked with
> similar
> >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> >>
> >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy 
> wrote:
> >>>
> >>> Tao,
> >>>
> >>> You are right there are many submodules in 3rd party. We have to 
> >>> start somewhere and I believe this one is a good candidate to start with.
> >>> This is not to cater to release of MXNet or to tie them with the 
> >>> releases of the submodules but instead to pick only stable 
> >>> releases and not to pick up bleeding edge commits from the tip of 
> >>> the master, this gives us confidence in the submodule that MXNet 
> >>> users are depending on that especially if we make MKLDNN the default.
> >>>
> >>> Good to know it is known already as a regression.Alex has created 
> >>> this issue https://github.com/apache/incubator-mxnet/issues/13369,

Re: Include MKLDNN into default mxnet pip package

2018-11-26 Thread Anirudh Subramanian
Hi Tao,

I agree with Steffen that we can start with a stable release for MKLDNN for
1.4.0. For your suggestion on using 0.17, can you provide info on what
versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
are some regressions found like the LSTM regression, would it be possible
to do a patch release for it or maintain a release branch for it ?

Anirudh

On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A  wrote:

> Hi Steffen,
>
> I think all the commits on MKL-DNN master branch are well tested for
> MKL-DNN development team. If we really want to have a release commit in the
> coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
>
> Thank you,
> Tao
>
> Sent from my iPhone
>
> > On Nov 26, 2018, at 8:09 AM, Steffen Rochel 
> wrote:
> >
> > +1 to make MKL-DNN default.
> > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
> > open issue to be addressed for 1.4.0
> > I do agree that we should move to a model to include released
> dependencies
> > instead of just taking bleeding edge snapshots.
> > However, speed of development is important as well.
> > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> development
> > team provide us with a well tested tag/commit id to include in 1.4.0
> > release?
> > Steffen
> >
> >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A  wrote:
> >>
> >> Thanks for the information, Kellen and Naveen.
> >>
> >> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
> >> release tags. My concern is that as MKL-DNN is still under intensive
> >> development, if it has a new feature or bug fix on its master branch,
> do we
> >> really want to wait for next release to get it supported in MXNet?
> >>
> >> Take the LSTM regression as an example, probably MKL-DNN will give a fix
> >> or improvement on its master branch soon, do we need to wait for 0.18
> >> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
> >> normal commit id, not release, as the dependency for MKL-DNN.
> >>
> >> Regarding the LSTM regression, we are using internal JIRA tickets rather
> >> than github issues to track the defects of MKL-DNN. But I agree with
> you,
> >> we need update the progress of it in Alex's issue.
> >>
> >> Thanks,
> >> -tao
> >>
> >> -Original Message-
> >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> >> Sent: Thursday, November 22, 2018 10:55 AM
> >> To: dev@mxnet.incubator.apache.org
> >> Subject: Re: Include MKLDNN into default mxnet pip package
> >>
> >> Agree with your point about other repos also not being based on
> versioning
> >> Tao.  I would point out that I've given some that I've worked with
> similar
> >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> >>
> >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy 
> wrote:
> >>>
> >>> Tao,
> >>>
> >>> You are right there are many submodules in 3rd party. We have to start
> >>> somewhere and I believe this one is a good candidate to start with.
> >>> This is not to cater to release of MXNet or to tie them with the
> >>> releases of the submodules but instead to pick only stable releases
> >>> and not to pick up bleeding edge commits from the tip of the master,
> >>> this gives us confidence in the submodule that MXNet users are
> >>> depending on that especially if we make MKLDNN the default.
> >>>
> >>> Good to know it is known already as a regression.Alex has created this
> >>> issue https://github.com/apache/incubator-mxnet/issues/13369, please
> >>> add details and link the corresponding issue in MKLDNN(I couldn't
> find).
> >>>
> >>> -Naveen
> >>>
> >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:
> >>>>
> >>>> Here are my answers for the questions from Kellen and Naveen about
> >>>> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
> >>>> default here.
> >>>>
> >>>> @Kellen,
> >>>>
> >>>> FYI, here is a list for those platforms which are officially
> >>>> supported by MKL-DNN.
> >>>> https://github.com/intel/mkl-dnn#system-requirements
> >>>>
> >>>> Most of computation intensive kernels in MKL-DNN are JITed. So they
> >>>> 

Re: Include MKLDNN into default mxnet pip package

2018-11-25 Thread Lv, Tao A
Hi Steffen, 

I think all the commits on MKL-DNN master branch are well tested for MKL-DNN 
development team. If we really want to have a release commit in the coming 1.4 
mxnet release, my suggestion is 0.17 MKL-DNN release.

Thank you,
Tao 

Sent from my iPhone

> On Nov 26, 2018, at 8:09 AM, Steffen Rochel  wrote:
> 
> +1 to make MKL-DNN default.
> I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
> open issue to be addressed for 1.4.0
> I do agree that we should move to a model to include released dependencies
> instead of just taking bleeding edge snapshots.
> However, speed of development is important as well.
> As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN development
> team provide us with a well tested tag/commit id to include in 1.4.0
> release?
> Steffen
> 
>> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A  wrote:
>> 
>> Thanks for the information, Kellen and Naveen.
>> 
>> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
>> release tags. My concern is that as MKL-DNN is still under intensive
>> development, if it has a new feature or bug fix on its master branch, do we
>> really want to wait for next release to get it supported in MXNet?
>> 
>> Take the LSTM regression as an example, probably MKL-DNN will give a fix
>> or improvement on its master branch soon, do we need to wait for 0.18
>> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
>> normal commit id, not release, as the dependency for MKL-DNN.
>> 
>> Regarding the LSTM regression, we are using internal JIRA tickets rather
>> than github issues to track the defects of MKL-DNN. But I agree with you,
>> we need update the progress of it in Alex's issue.
>> 
>> Thanks,
>> -tao
>> 
>> -Original Message-----
>> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
>> Sent: Thursday, November 22, 2018 10:55 AM
>> To: dev@mxnet.incubator.apache.org
>> Subject: Re: Include MKLDNN into default mxnet pip package
>> 
>> Agree with your point about other repos also not being based on versioning
>> Tao.  I would point out that I've given some that I've worked with similar
>> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
>> 
>>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy  wrote:
>>> 
>>> Tao,
>>> 
>>> You are right there are many submodules in 3rd party. We have to start
>>> somewhere and I believe this one is a good candidate to start with.
>>> This is not to cater to release of MXNet or to tie them with the
>>> releases of the submodules but instead to pick only stable releases
>>> and not to pick up bleeding edge commits from the tip of the master,
>>> this gives us confidence in the submodule that MXNet users are
>>> depending on that especially if we make MKLDNN the default.
>>> 
>>> Good to know it is known already as a regression.Alex has created this
>>> issue https://github.com/apache/incubator-mxnet/issues/13369, please
>>> add details and link the corresponding issue in MKLDNN(I couldn't find).
>>> 
>>> -Naveen
>>> 
>>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:
>>>> 
>>>> Here are my answers for the questions from Kellen and Naveen about
>>>> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
>>>> default here.
>>>> 
>>>> @Kellen,
>>>> 
>>>> FYI, here is a list for those platforms which are officially
>>>> supported by MKL-DNN.
>>>> https://github.com/intel/mkl-dnn#system-requirements
>>>> 
>>>> Most of computation intensive kernels in MKL-DNN are JITed. So they
>>>> are supposed to generate code according to the platform during
>>>> runtime. For non-JIT code in MKL-DNN, same as other code in MXNet,
>>>> it will generate instructions according to the options/flags of
>>>> compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid
>>>> optimization for compiling machine. That's exactly what we are doing
>> for MKL-DNN build in MXNet.
>>> Even
>>>> without MKL-DNN, I noticed there were issues about illegal
>>>> instructions
>>> of
>>>> MXNet when users import the pip package on a lower end machine which
>>>> probably only supports SSE.
>>>> 
>>>> @Naveen,
>>>> 
>>>> The LSTM issue has already been identified as a regression from the
>>> recent
&g

Re: Include MKLDNN into default mxnet pip package

2018-11-25 Thread Steffen Rochel
+1 to make MKL-DNN default.
I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
open issue to be addressed for 1.4.0
I do agree that we should move to a model to include released dependencies
instead of just taking bleeding edge snapshots.
However, speed of development is important as well.
As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN development
team provide us with a well tested tag/commit id to include in 1.4.0
release?
Steffen

On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A  wrote:

> Thanks for the information, Kellen and Naveen.
>
> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
> release tags. My concern is that as MKL-DNN is still under intensive
> development, if it has a new feature or bug fix on its master branch, do we
> really want to wait for next release to get it supported in MXNet?
>
> Take the LSTM regression as an example, probably MKL-DNN will give a fix
> or improvement on its master branch soon, do we need to wait for 0.18
> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
> normal commit id, not release, as the dependency for MKL-DNN.
>
> Regarding the LSTM regression, we are using internal JIRA tickets rather
> than github issues to track the defects of MKL-DNN. But I agree with you,
> we need update the progress of it in Alex's issue.
>
> Thanks,
> -tao
>
> -Original Message-
> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> Sent: Thursday, November 22, 2018 10:55 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Agree with your point about other repos also not being based on versioning
> Tao.  I would point out that I've given some that I've worked with similar
> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
>
> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy  wrote:
>
> > Tao,
> >
> > You are right there are many submodules in 3rd party. We have to start
> > somewhere and I believe this one is a good candidate to start with.
> > This is not to cater to release of MXNet or to tie them with the
> > releases of the submodules but instead to pick only stable releases
> > and not to pick up bleeding edge commits from the tip of the master,
> > this gives us confidence in the submodule that MXNet users are
> > depending on that especially if we make MKLDNN the default.
> >
> > Good to know it is known already as a regression.Alex has created this
> > issue https://github.com/apache/incubator-mxnet/issues/13369, please
> > add details and link the corresponding issue in MKLDNN(I couldn't find).
> >
> > -Naveen
> >
> > On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:
> >
> > > Here are my answers for the questions from Kellen and Naveen about
> > > MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
> > > default here.
> > >
> > > @Kellen,
> > >
> > > FYI, here is a list for those platforms which are officially
> > > supported by MKL-DNN.
> > > https://github.com/intel/mkl-dnn#system-requirements
> > >
> > > Most of computation intensive kernels in MKL-DNN are JITed. So they
> > > are supposed to generate code according to the platform during
> > > runtime. For non-JIT code in MKL-DNN, same as other code in MXNet,
> > > it will generate instructions according to the options/flags of
> > > compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid
> > > optimization for compiling machine. That's exactly what we are doing
> for MKL-DNN build in MXNet.
> > Even
> > > without MKL-DNN, I noticed there were issues about illegal
> > > instructions
> > of
> > > MXNet when users import the pip package on a lower end machine which
> > > probably only supports SSE.
> > >
> > > @Naveen,
> > >
> > > The LSTM issue has already been identified as a regression from the
> > recent
> > > version of MKL-DNN. Hopefully it will be fixed soon with a new
> > > update of MKL-DNN.
> > >
> > > MXNet has many submodule dependencies under the 3rd party folder.
> > > Seems
> > we
> > > don't require release versions for most of these dependencies. The
> > release
> > > period of MKL-DNN and MXNet are not matched very well. I think it
> > > would
> > be
> > > a risk for MXNet release if it hardly depends on the release of a
> > > submodule, no need to say depends on the releases of all submodules.
> > >
> > > -tao
> > >
>

RE: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Lv, Tao A
Thanks for the information, Kellen and Naveen.

Better than onnx-tensorrt, MKL-DNN has already provided versioning and release 
tags. My concern is that as MKL-DNN is still under intensive development, if it 
has a new feature or bug fix on its master branch, do we really want to wait 
for next release to get it supported in MXNet?

Take the LSTM regression as an example, probably MKL-DNN will give a fix or 
improvement on its master branch soon, do we need to wait for 0.18 release to 
get it fixed for mxnet user? AFAIK, tensorflow is also using normal commit id, 
not release, as the dependency for MKL-DNN.

Regarding the LSTM regression, we are using internal JIRA tickets rather than 
github issues to track the defects of MKL-DNN. But I agree with you, we need 
update the progress of it in Alex's issue.

Thanks,
-tao

-Original Message-
From: kellen sunderland [mailto:kellen.sunderl...@gmail.com] 
Sent: Thursday, November 22, 2018 10:55 AM
To: dev@mxnet.incubator.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Agree with your point about other repos also not being based on versioning Tao. 
 I would point out that I've given some that I've worked with similar
feedback: https://github.com/onnx/onnx-tensorrt/issues/68

On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy  wrote:

> Tao,
>
> You are right there are many submodules in 3rd party. We have to start 
> somewhere and I believe this one is a good candidate to start with. 
> This is not to cater to release of MXNet or to tie them with the 
> releases of the submodules but instead to pick only stable releases 
> and not to pick up bleeding edge commits from the tip of the master, 
> this gives us confidence in the submodule that MXNet users are 
> depending on that especially if we make MKLDNN the default.
>
> Good to know it is known already as a regression.Alex has created this 
> issue https://github.com/apache/incubator-mxnet/issues/13369, please 
> add details and link the corresponding issue in MKLDNN(I couldn't find).
>
> -Naveen
>
> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:
>
> > Here are my answers for the questions from Kellen and Naveen about 
> > MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN 
> > default here.
> >
> > @Kellen,
> >
> > FYI, here is a list for those platforms which are officially 
> > supported by MKL-DNN.
> > https://github.com/intel/mkl-dnn#system-requirements
> >
> > Most of computation intensive kernels in MKL-DNN are JITed. So they 
> > are supposed to generate code according to the platform during 
> > runtime. For non-JIT code in MKL-DNN, same as other code in MXNet, 
> > it will generate instructions according to the options/flags of 
> > compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid 
> > optimization for compiling machine. That's exactly what we are doing for 
> > MKL-DNN build in MXNet.
> Even
> > without MKL-DNN, I noticed there were issues about illegal 
> > instructions
> of
> > MXNet when users import the pip package on a lower end machine which 
> > probably only supports SSE.
> >
> > @Naveen,
> >
> > The LSTM issue has already been identified as a regression from the
> recent
> > version of MKL-DNN. Hopefully it will be fixed soon with a new 
> > update of MKL-DNN.
> >
> > MXNet has many submodule dependencies under the 3rd party folder. 
> > Seems
> we
> > don't require release versions for most of these dependencies. The
> release
> > period of MKL-DNN and MXNet are not matched very well. I think it 
> > would
> be
> > a risk for MXNet release if it hardly depends on the release of a 
> > submodule, no need to say depends on the releases of all submodules.
> >
> > -tao
> >
> > -Original Message-
> > From: Naveen Swamy [mailto:mnnav...@gmail.com]
> > Sent: Thursday, November 22, 2018 9:08 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: d...@mxnet.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Alex,
> >
> > Thanks for promptly running the numbers on AMD and reporting here.
> >
> > Can you please update the AMD numbers here for posterity
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL
> -DNN+-+Performance+Benchmarking
> > ?
> >
> > are there any outstanding issues when MKLDNN is enabled? from my 
> > offline conversation I am briefly aware performance issues with 
> > LSTM, is there an GitHub issue for it?
> >
> > MKLDNN is a submodule dependency, are we pulling the latest commit 
> > or releases

Re: Include MKLDNN into default mxnet pip package

2018-11-21 Thread kellen sunderland
Agree with your point about other repos also not being based on versioning
Tao.  I would point out that I've given some that I've worked with similar
feedback: https://github.com/onnx/onnx-tensorrt/issues/68

On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy  wrote:

> Tao,
>
> You are right there are many submodules in 3rd party. We have to start
> somewhere and I believe this one is a good candidate to start with. This is
> not to cater to release of MXNet or to tie them with the releases of the
> submodules but instead to pick only stable releases and not to pick up
> bleeding edge commits from the tip of the master, this gives us confidence
> in the submodule that MXNet users are depending on that especially if we
> make MKLDNN the default.
>
> Good to know it is known already as a regression.Alex has created this
> issue https://github.com/apache/incubator-mxnet/issues/13369, please add
> details and link the corresponding issue in MKLDNN(I couldn't find).
>
> -Naveen
>
> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:
>
> > Here are my answers for the questions from Kellen and Naveen about
> > MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN default
> > here.
> >
> > @Kellen,
> >
> > FYI, here is a list for those platforms which are officially supported by
> > MKL-DNN.
> > https://github.com/intel/mkl-dnn#system-requirements
> >
> > Most of computation intensive kernels in MKL-DNN are JITed. So they are
> > supposed to generate code according to the platform during runtime. For
> > non-JIT code in MKL-DNN, same as other code in MXNet, it will generate
> > instructions according to the options/flags of compiler. We can set
> > -DARCH_OPT_FLAGS when build MKL-DNN to avoid optimization for compiling
> > machine. That's exactly what we are doing for MKL-DNN build in MXNet.
> Even
> > without MKL-DNN, I noticed there were issues about illegal instructions
> of
> > MXNet when users import the pip package on a lower end machine which
> > probably only supports SSE.
> >
> > @Naveen,
> >
> > The LSTM issue has already been identified as a regression from the
> recent
> > version of MKL-DNN. Hopefully it will be fixed soon with a new update of
> > MKL-DNN.
> >
> > MXNet has many submodule dependencies under the 3rd party folder. Seems
> we
> > don't require release versions for most of these dependencies. The
> release
> > period of MKL-DNN and MXNet are not matched very well. I think it would
> be
> > a risk for MXNet release if it hardly depends on the release of a
> > submodule, no need to say depends on the releases of all submodules.
> >
> > -tao
> >
> > -Original Message-
> > From: Naveen Swamy [mailto:mnnav...@gmail.com]
> > Sent: Thursday, November 22, 2018 9:08 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: d...@mxnet.apache.org
> > Subject: Re: Include MKLDNN into default mxnet pip package
> >
> > Hi Alex,
> >
> > Thanks for promptly running the numbers on AMD and reporting here.
> >
> > Can you please update the AMD numbers here for posterity
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> > ?
> >
> > are there any outstanding issues when MKLDNN is enabled? from my offline
> > conversation I am briefly aware performance issues with LSTM, is there an
> > GitHub issue for it?
> >
> > MKLDNN is a submodule dependency, are we pulling the latest commit or
> > releases  ? If not we should move to releases before we make it a
> default.
> > Ideally we should use platform specific distributions (-dev packages) at
> > least we should rely on well tested releases.
> >
> >
> > Thanks, Naveen
> >
> > On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
>  > >
> > wrote:
> >
> > > AMD benchmarks have been published. We are seeing a x15.8 speedup with
> > > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> > > smaller network (Mobilenet - batch size 32) the speedup is more
> > > significant at x38.7. Let's have a vote to see if the PR to have
> > > MKLDNN enabled by default
> > > (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> > > before 1.4.0 release.
> > >
> > > On 10/19/18, 9:17 AM, "Pedro Larroy" 
> > > wrote:
> > >
> > > I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X
> > > and unit
> > > tests are passing.
> > >

Re: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Naveen Swamy
Tao,

You are right there are many submodules in 3rd party. We have to start
somewhere and I believe this one is a good candidate to start with. This is
not to cater to release of MXNet or to tie them with the releases of the
submodules but instead to pick only stable releases and not to pick up
bleeding edge commits from the tip of the master, this gives us confidence
in the submodule that MXNet users are depending on that especially if we
make MKLDNN the default.

Good to know it is known already as a regression.Alex has created this
issue https://github.com/apache/incubator-mxnet/issues/13369, please add
details and link the corresponding issue in MKLDNN(I couldn't find).

-Naveen

On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A  wrote:

> Here are my answers for the questions from Kellen and Naveen about
> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN default
> here.
>
> @Kellen,
>
> FYI, here is a list for those platforms which are officially supported by
> MKL-DNN.
> https://github.com/intel/mkl-dnn#system-requirements
>
> Most of computation intensive kernels in MKL-DNN are JITed. So they are
> supposed to generate code according to the platform during runtime. For
> non-JIT code in MKL-DNN, same as other code in MXNet, it will generate
> instructions according to the options/flags of compiler. We can set
> -DARCH_OPT_FLAGS when build MKL-DNN to avoid optimization for compiling
> machine. That's exactly what we are doing for MKL-DNN build in MXNet. Even
> without MKL-DNN, I noticed there were issues about illegal instructions of
> MXNet when users import the pip package on a lower end machine which
> probably only supports SSE.
>
> @Naveen,
>
> The LSTM issue has already been identified as a regression from the recent
> version of MKL-DNN. Hopefully it will be fixed soon with a new update of
> MKL-DNN.
>
> MXNet has many submodule dependencies under the 3rd party folder. Seems we
> don't require release versions for most of these dependencies. The release
> period of MKL-DNN and MXNet are not matched very well. I think it would be
> a risk for MXNet release if it hardly depends on the release of a
> submodule, no need to say depends on the releases of all submodules.
>
> -tao
>
> -Original Message-
> From: Naveen Swamy [mailto:mnnav...@gmail.com]
> Sent: Thursday, November 22, 2018 9:08 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Alex,
>
> Thanks for promptly running the numbers on AMD and reporting here.
>
> Can you please update the AMD numbers here for posterity
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> ?
>
> are there any outstanding issues when MKLDNN is enabled? from my offline
> conversation I am briefly aware performance issues with LSTM, is there an
> GitHub issue for it?
>
> MKLDNN is a submodule dependency, are we pulling the latest commit or
> releases  ? If not we should move to releases before we make it a default.
> Ideally we should use platform specific distributions (-dev packages) at
> least we should rely on well tested releases.
>
>
> Thanks, Naveen
>
> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander  >
> wrote:
>
> > AMD benchmarks have been published. We are seeing a x15.8 speedup with
> > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> > smaller network (Mobilenet - batch size 32) the speedup is more
> > significant at x38.7. Let's have a vote to see if the PR to have
> > MKLDNN enabled by default
> > (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> > before 1.4.0 release.
> >
> > On 10/19/18, 9:17 AM, "Pedro Larroy" 
> > wrote:
> >
> > I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X
> > and unit
> > tests are passing.
> >
> > Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> > There's no "avx2" like on recent intel cpus.
> >
> > Pedro.
> >
> > On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> > wrote:
> >
> > > Awesome collaborative effort across many contributors and
> companies!
> > >
> > > The boost is impressive and for MXNet users to get this boost
> > "out of the
> > > box" is a great benefit and makes MXNet an even better choice.
> > >
> > > Alex - can you clarify whether there are any down sides with
> > regards to
> > > noon AVX-512 architectures, AMD CPUs, etc? Will it gracef

RE: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Zhao, Patric
Hi Kellen,

Thank you very much for your recognition for our works :) 

This is a great joint work from the community (Wu Jun, Zheng Da, etc.) and 
Intel team.

We are continuously improving the quantization flow now and more amazing 
features will be ready soon.

Thanks,

--Patric

> -Original Message-
> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> Sent: Thursday, November 22, 2018 9:07 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> I've spent the last few days testing MXNet w/ MKLDNN and quantized models
> and it's a beast.  Really good speed improvements on my models, no bugs
> that I've noticed.
> 
> I'm in general supportive but I'm still wondering what the story is like when
> there's no AVX instructions present on CPUs.  Do we get an illegal instruction
> error, or does it fallback gracefully?  So far it sounds like it works on a
> Threadripper and Xen AMD CPU.  I can try on a Ryzen.  What about older
> Intel or AMD CPUs?
> 
> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> 
> wrote:
> 
> > AMD benchmarks have been published. We are seeing a x15.8 speedup
> with
> > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> > smaller network (Mobilenet - batch size 32) the speedup is more
> > significant at x38.7. Let's have a vote to see if the PR to have
> > MKLDNN enabled by default
> > (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> > before 1.4.0 release.
> >
> > On 10/19/18, 9:17 AM, "Pedro Larroy" 
> > wrote:
> >
> > I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X
> > and unit
> > tests are passing.
> >
> > Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> > There's no "avx2" like on recent intel cpus.
> >
> > Pedro.
> >
> > On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> > wrote:
> >
> > > Awesome collaborative effort across many contributors and companies!
> > >
> > > The boost is impressive and for MXNet users to get this boost
> > "out of the
> > > box" is a great benefit and makes MXNet an even better choice.
> > >
> > > Alex - can you clarify whether there are any down sides with
> > regards to
> > > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully
> > fallback?
> > >
> > > Hagay
> > >
> > >
> > > On Fri, Oct 19, 2018, 15:46 Sergio Fernández 
> > wrote:
> > >
> > > > If there is no downside on platforms not supporting AVX512
> > instructions,
> > > > then +1
> > > >
> > > >
> > > > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> > > >
> > > > > Hey all,
> > > > > We have been working hard these past few months to integrate and
> > > > stabilize
> > > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and
> > have made
> > > > > incredible progress. On CPUs with AVX512 instructions (such
> > as
> > c5.18x)
> > > we
> > > > > have seen performance increase up to 12x and on other
> > platforms (Macs,
> > > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can
> > be found
> > > > here
> > > > > (
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=956507
> 64
> > > > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > > > >
> > > > > Currently, using this accelerator requires the developer to
> > either pip
> > > > > install the mxnet-mkl version of mxnet or to build it
> > themselves from
> > > > > source. Given that we should try to provide the best
> > performance "out
> > > of
> > > > > the box” with mxnet we should include this in the default build.
> > The
> > > > mkldnn
> > > > > library is included with in the pip package build so it does not
> > > require
> > > > an
> > > > > external dependency.
> > > > >
> > > > > There were concerns that MKLDNN could cause regressions on
> > certain
> > > > > platforms (as it did with the tensorflow version a while
> > back); but we
> > > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to
> > turn of
> > > this
> > > > > feature during runtime. Please bring up any other concerns
> > you may have
> > > > and
> > > > > your thoughts on including this accelerator in the default build.
> > > > >
> > > > > Best,
> > > > > Alex
> > > > >
> > > >
> > >
> >
> >
> >


RE: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Lv, Tao A
Here are my answers for the questions from Kellen and Naveen about MKL-DNN. It 
doesn't mean that I'm supportive for making MKL-DNN default here.

@Kellen,

FYI, here is a list for those platforms which are officially supported by 
MKL-DNN.
https://github.com/intel/mkl-dnn#system-requirements 

Most of computation intensive kernels in MKL-DNN are JITed. So they are 
supposed to generate code according to the platform during runtime. For non-JIT 
code in MKL-DNN, same as other code in MXNet, it will generate instructions 
according to the options/flags of compiler. We can set -DARCH_OPT_FLAGS when 
build MKL-DNN to avoid optimization for compiling machine. That's exactly what 
we are doing for MKL-DNN build in MXNet. Even without MKL-DNN, I noticed there 
were issues about illegal instructions of MXNet when users import the pip 
package on a lower end machine which probably only supports SSE.

@Naveen,

The LSTM issue has already been identified as a regression from the recent 
version of MKL-DNN. Hopefully it will be fixed soon with a new update of 
MKL-DNN.

MXNet has many submodule dependencies under the 3rd party folder. Seems we 
don't require release versions for most of these dependencies. The release 
period of MKL-DNN and MXNet are not matched very well. I think it would be a 
risk for MXNet release if it hardly depends on the release of a submodule, no 
need to say depends on the releases of all submodules.

-tao

-Original Message-
From: Naveen Swamy [mailto:mnnav...@gmail.com] 
Sent: Thursday, November 22, 2018 9:08 AM
To: dev@mxnet.incubator.apache.org
Cc: d...@mxnet.apache.org
Subject: Re: Include MKLDNN into default mxnet pip package

Hi Alex,

Thanks for promptly running the numbers on AMD and reporting here.

Can you please update the AMD numbers here for posterity 
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
?

are there any outstanding issues when MKLDNN is enabled? from my offline 
conversation I am briefly aware performance issues with LSTM, is there an 
GitHub issue for it?

MKLDNN is a submodule dependency, are we pulling the latest commit or releases  
? If not we should move to releases before we make it a default.
Ideally we should use platform specific distributions (-dev packages) at least 
we should rely on well tested releases.


Thanks, Naveen

On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander 
wrote:

> AMD benchmarks have been published. We are seeing a x15.8 speedup with
> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a 
> smaller network (Mobilenet - batch size 32) the speedup is more 
> significant at x38.7. Let's have a vote to see if the PR to have 
> MKLDNN enabled by default
> (https://github.com/apache/incubator-mxnet/pull/12591) can be merged 
> before 1.4.0 release.
>
> On 10/19/18, 9:17 AM, "Pedro Larroy" 
> wrote:
>
> I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X 
> and unit
> tests are passing.
>
> Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> There's no "avx2" like on recent intel cpus.
>
> Pedro.
>
> On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> wrote:
>
> > Awesome collaborative effort across many contributors and companies!
> >
> > The boost is impressive and for MXNet users to get this boost 
> "out of the
> > box" is a great benefit and makes MXNet an even better choice.
> >
> > Alex - can you clarify whether there are any down sides with 
> regards to
> > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully 
> fallback?
> >
> > Hagay
> >
> >
> > On Fri, Oct 19, 2018, 15:46 Sergio Fernández 
> wrote:
> >
> > > If there is no downside on platforms not supporting AVX512 
> instructions,
> > > then +1
> > >
> > >
> > > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> > >
> > > > Hey all,
> > > > We have been working hard these past few months to integrate and
> > > stabilize
> > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and 
> have made
> > > > incredible progress. On CPUs with AVX512 instructions (such 
> as
> c5.18x)
> > we
> > > > have seen performance increase up to 12x and on other 
> platforms (Macs,
> > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can 
> be found
> > > here
> > > > (
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
>  

Re: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Naveen Swamy
Hi Alex,

Thanks for promptly running the numbers on AMD and reporting here.

Can you please update the AMD numbers here for posterity
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
?

are there any outstanding issues when MKLDNN is enabled? from my offline
conversation I am briefly aware performance issues with LSTM, is there an
GitHub issue for it?

MKLDNN is a submodule dependency, are we pulling the latest commit or
releases  ? If not we should move to releases before we make it a default.
Ideally we should use platform specific distributions (-dev packages) at
least we should rely on well tested releases.


Thanks, Naveen

On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander 
wrote:

> AMD benchmarks have been published. We are seeing a x15.8 speedup with
> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a smaller
> network (Mobilenet - batch size 32) the speedup is more significant at
> x38.7. Let's have a vote to see if the PR to have MKLDNN enabled by default
> (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> before 1.4.0 release.
>
> On 10/19/18, 9:17 AM, "Pedro Larroy" 
> wrote:
>
> I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X and
> unit
> tests are passing.
>
> Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> There's no "avx2" like on recent intel cpus.
>
> Pedro.
>
> On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> wrote:
>
> > Awesome collaborative effort across many contributors and companies!
> >
> > The boost is impressive and for MXNet users to get this boost "out
> of the
> > box" is a great benefit and makes MXNet an even better choice.
> >
> > Alex - can you clarify whether there are any down sides with regards
> to
> > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully
> fallback?
> >
> > Hagay
> >
> >
> > On Fri, Oct 19, 2018, 15:46 Sergio Fernández 
> wrote:
> >
> > > If there is no downside on platforms not supporting AVX512
> instructions,
> > > then +1
> > >
> > >
> > > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> > >
> > > > Hey all,
> > > > We have been working hard these past few months to integrate and
> > > stabilize
> > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have
> made
> > > > incredible progress. On CPUs with AVX512 instructions (such as
> c5.18x)
> > we
> > > > have seen performance increase up to 12x and on other platforms
> (Macs,
> > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be
> found
> > > here
> > > > (
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> > > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > > >
> > > > Currently, using this accelerator requires the developer to
> either pip
> > > > install the mxnet-mkl version of mxnet or to build it themselves
> from
> > > > source. Given that we should try to provide the best performance
> "out
> > of
> > > > the box” with mxnet we should include this in the default build.
> The
> > > mkldnn
> > > > library is included with in the pip package build so it does not
> > require
> > > an
> > > > external dependency.
> > > >
> > > > There were concerns that MKLDNN could cause regressions on
> certain
> > > > platforms (as it did with the tensorflow version a while back);
> but we
> > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to
> turn of
> > this
> > > > feature during runtime. Please bring up any other concerns you
> may have
> > > and
> > > > your thoughts on including this accelerator in the default build.
> > > >
> > > > Best,
> > > > Alex
> > > >
> > >
> >
>
>
>


Re: Include MKLDNN into default mxnet pip package

2018-11-21 Thread kellen sunderland
I've spent the last few days testing MXNet w/ MKLDNN and quantized models
and it's a beast.  Really good speed improvements on my models, no bugs
that I've noticed.

I'm in general supportive but I'm still wondering what the story is like
when there's no AVX instructions present on CPUs.  Do we get an illegal
instruction error, or does it fallback gracefully?  So far it sounds like
it works on a Threadripper and Xen AMD CPU.  I can try on a Ryzen.  What
about older Intel or AMD CPUs?

On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander 
wrote:

> AMD benchmarks have been published. We are seeing a x15.8 speedup with
> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a smaller
> network (Mobilenet - batch size 32) the speedup is more significant at
> x38.7. Let's have a vote to see if the PR to have MKLDNN enabled by default
> (https://github.com/apache/incubator-mxnet/pull/12591) can be merged
> before 1.4.0 release.
>
> On 10/19/18, 9:17 AM, "Pedro Larroy" 
> wrote:
>
> I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X and
> unit
> tests are passing.
>
> Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
> There's no "avx2" like on recent intel cpus.
>
> Pedro.
>
> On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko 
> wrote:
>
> > Awesome collaborative effort across many contributors and companies!
> >
> > The boost is impressive and for MXNet users to get this boost "out
> of the
> > box" is a great benefit and makes MXNet an even better choice.
> >
> > Alex - can you clarify whether there are any down sides with regards
> to
> > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully
> fallback?
> >
> > Hagay
> >
> >
> > On Fri, Oct 19, 2018, 15:46 Sergio Fernández 
> wrote:
> >
> > > If there is no downside on platforms not supporting AVX512
> instructions,
> > > then +1
> > >
> > >
> > > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> > >
> > > > Hey all,
> > > > We have been working hard these past few months to integrate and
> > > stabilize
> > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have
> made
> > > > incredible progress. On CPUs with AVX512 instructions (such as
> c5.18x)
> > we
> > > > have seen performance increase up to 12x and on other platforms
> (Macs,
> > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be
> found
> > > here
> > > > (
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> > > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > > >
> > > > Currently, using this accelerator requires the developer to
> either pip
> > > > install the mxnet-mkl version of mxnet or to build it themselves
> from
> > > > source. Given that we should try to provide the best performance
> "out
> > of
> > > > the box” with mxnet we should include this in the default build.
> The
> > > mkldnn
> > > > library is included with in the pip package build so it does not
> > require
> > > an
> > > > external dependency.
> > > >
> > > > There were concerns that MKLDNN could cause regressions on
> certain
> > > > platforms (as it did with the tensorflow version a while back);
> but we
> > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to
> turn of
> > this
> > > > feature during runtime. Please bring up any other concerns you
> may have
> > > and
> > > > your thoughts on including this accelerator in the default build.
> > > >
> > > > Best,
> > > > Alex
> > > >
> > >
> >
>
>
>


Re: Include MKLDNN into default mxnet pip package

2018-11-21 Thread Zai, Alexander
AMD benchmarks have been published. We are seeing a x15.8 speedup with Resnet50 
(batch size 32) on AWS's new m5a.24xlarge machine. With a smaller network 
(Mobilenet - batch size 32) the speedup is more significant at x38.7. Let's 
have a vote to see if the PR to have MKLDNN enabled by default 
(https://github.com/apache/incubator-mxnet/pull/12591) can be merged before 
1.4.0 release.

On 10/19/18, 9:17 AM, "Pedro Larroy"  wrote:

I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X and unit
tests are passing.

Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
There's no "avx2" like on recent intel cpus.

Pedro.

On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko  wrote:

> Awesome collaborative effort across many contributors and companies!
>
> The boost is impressive and for MXNet users to get this boost "out of the
> box" is a great benefit and makes MXNet an even better choice.
>
> Alex - can you clarify whether there are any down sides with regards to
> noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully fallback?
>
> Hagay
>
>
> On Fri, Oct 19, 2018, 15:46 Sergio Fernández  wrote:
>
> > If there is no downside on platforms not supporting AVX512 instructions,
> > then +1
> >
> >
> > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> >
> > > Hey all,
> > > We have been working hard these past few months to integrate and
> > stabilize
> > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> > > incredible progress. On CPUs with AVX512 instructions (such as c5.18x)
> we
> > > have seen performance increase up to 12x and on other platforms (Macs,
> > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found
> > here
> > > (
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > >
> > > Currently, using this accelerator requires the developer to either pip
> > > install the mxnet-mkl version of mxnet or to build it themselves from
> > > source. Given that we should try to provide the best performance "out
> of
> > > the box” with mxnet we should include this in the default build. The
> > mkldnn
> > > library is included with in the pip package build so it does not
> require
> > an
> > > external dependency.
> > >
> > > There were concerns that MKLDNN could cause regressions on certain
> > > platforms (as it did with the tensorflow version a while back); but we
> > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of
> this
> > > feature during runtime. Please bring up any other concerns you may 
have
> > and
> > > your thoughts on including this accelerator in the default build.
> > >
> > > Best,
> > > Alex
> > >
> >
>




Re: Include MKLDNN into default mxnet pip package

2018-10-19 Thread Pedro Larroy
I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X and unit
tests are passing.

Is this build using AVX512?  in /proc/cpuinfo I see only "avx" flag.
There's no "avx2" like on recent intel cpus.

Pedro.

On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko  wrote:

> Awesome collaborative effort across many contributors and companies!
>
> The boost is impressive and for MXNet users to get this boost "out of the
> box" is a great benefit and makes MXNet an even better choice.
>
> Alex - can you clarify whether there are any down sides with regards to
> noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully fallback?
>
> Hagay
>
>
> On Fri, Oct 19, 2018, 15:46 Sergio Fernández  wrote:
>
> > If there is no downside on platforms not supporting AVX512 instructions,
> > then +1
> >
> >
> > On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
> >
> > > Hey all,
> > > We have been working hard these past few months to integrate and
> > stabilize
> > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> > > incredible progress. On CPUs with AVX512 instructions (such as c5.18x)
> we
> > > have seen performance increase up to 12x and on other platforms (Macs,
> > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found
> > here
> > > (
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> > >  and https://github.com/apache/incubator-mxnet/pull/12591).
> > >
> > > Currently, using this accelerator requires the developer to either pip
> > > install the mxnet-mkl version of mxnet or to build it themselves from
> > > source. Given that we should try to provide the best performance "out
> of
> > > the box” with mxnet we should include this in the default build. The
> > mkldnn
> > > library is included with in the pip package build so it does not
> require
> > an
> > > external dependency.
> > >
> > > There were concerns that MKLDNN could cause regressions on certain
> > > platforms (as it did with the tensorflow version a while back); but we
> > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of
> this
> > > feature during runtime. Please bring up any other concerns you may have
> > and
> > > your thoughts on including this accelerator in the default build.
> > >
> > > Best,
> > > Alex
> > >
> >
>


Re: Include MKLDNN into default mxnet pip package

2018-10-19 Thread Hagay Lupesko
Awesome collaborative effort across many contributors and companies!

The boost is impressive and for MXNet users to get this boost "out of the
box" is a great benefit and makes MXNet an even better choice.

Alex - can you clarify whether there are any down sides with regards to
noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully fallback?

Hagay


On Fri, Oct 19, 2018, 15:46 Sergio Fernández  wrote:

> If there is no downside on platforms not supporting AVX512 instructions,
> then +1
>
>
> On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:
>
> > Hey all,
> > We have been working hard these past few months to integrate and
> stabilize
> > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> > incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
> > have seen performance increase up to 12x and on other platforms (Macs,
> > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found
> here
> > (
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> >  and https://github.com/apache/incubator-mxnet/pull/12591).
> >
> > Currently, using this accelerator requires the developer to either pip
> > install the mxnet-mkl version of mxnet or to build it themselves from
> > source. Given that we should try to provide the best performance "out of
> > the box” with mxnet we should include this in the default build. The
> mkldnn
> > library is included with in the pip package build so it does not require
> an
> > external dependency.
> >
> > There were concerns that MKLDNN could cause regressions on certain
> > platforms (as it did with the tensorflow version a while back); but we
> > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
> > feature during runtime. Please bring up any other concerns you may have
> and
> > your thoughts on including this accelerator in the default build.
> >
> > Best,
> > Alex
> >
>


Re: Include MKLDNN into default mxnet pip package

2018-10-19 Thread Sergio Fernández
If there is no downside on platforms not supporting AVX512 instructions,
then +1


On Wed, Oct 17, 2018, 14:10 Alex Zai  wrote:

> Hey all,
> We have been working hard these past few months to integrate and stabilize
> Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
> have seen performance increase up to 12x and on other platforms (Macs,
> AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
> (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
>  and https://github.com/apache/incubator-mxnet/pull/12591).
>
> Currently, using this accelerator requires the developer to either pip
> install the mxnet-mkl version of mxnet or to build it themselves from
> source. Given that we should try to provide the best performance "out of
> the box” with mxnet we should include this in the default build. The mkldnn
> library is included with in the pip package build so it does not require an
> external dependency.
>
> There were concerns that MKLDNN could cause regressions on certain
> platforms (as it did with the tensorflow version a while back); but we
> added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
> feature during runtime. Please bring up any other concerns you may have and
> your thoughts on including this accelerator in the default build.
>
> Best,
> Alex
>


Re: Include MKLDNN into default mxnet pip package

2018-10-18 Thread Pedro Larroy
Very nice! 

Pedro

> On 17. Oct 2018, at 23:12, Alfredo Luque  
> wrote:
> 
> This is huge. Thanks for working on this. Is there a similar plan with eg;
> tensor-rt support being ported into the main cuda-9.x packages?
> 
> On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:
> 
> Hey all,
> We have been working hard these past few months to integrate and stabilize
> Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
> have seen performance increase up to 12x and on other platforms (Macs,
> AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
> (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> and https://github.com/apache/incubator-mxnet/pull/12591).
> 
> Currently, using this accelerator requires the developer to either pip
> install the mxnet-mkl version of mxnet or to build it themselves from
> source. Given that we should try to provide the best performance "out of
> the box” with mxnet we should include this in the default build. The mkldnn
> library is included with in the pip package build so it does not require an
> external dependency.
> 
> There were concerns that MKLDNN could cause regressions on certain
> platforms (as it did with the tensorflow version a while back); but we
> added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
> feature during runtime. Please bring up any other concerns you may have and
> your thoughts on including this accelerator in the default build.
> 
> Best,
> Alex
> 
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA


RE: Include MKLDNN into default mxnet pip package

2018-10-18 Thread Zhao, Patric
Thanks Alex for bringing up this proposal. As far as I know, applied to the 
MKL-DNN backend, MXNet is the most performant framework on CPU side now. 
Especially that the recent subgraph fusion feature boosts the performance a lot 
again. 
Thus, I think it’s worth to make it default and let more users leverage the 
benefits of it.

Regarding MKL-DNN integration, it’s a joint work and takes lots of effort from 
Amazon and Intel engineers, including Da, Jun, Haibin, Junyuan, Sheng, Marco, 
Chris (AWS) and Patric, Tao, Wenting, Rong , Jin, Shufan, Ashok (Intel).
We also got many great suggestions from MXNet community and learned much from 
those discussions. Here I personally want to appreciate Da Zheng for his great 
efforts in this project. 
As the main contributor, he plays an important role in the project, from the 
initial co-design, implementations to recent advanced subgraph feature and 
finally makes these good things happen.

I would like to thank Alex for stabilizing MKL-DNN backend by adding more tests 
for it and also environment variables so the user can switch between the 
original flow and MKL-DNN flow easily. 
His efforts are really helpful for pushing MKL-DNN backend from experimental 
toward GA.

MXNet community is one of the best groups and there're many intelligent people 
here. 

Thank you all for the strong support.

--Patric 

> -Original Message-
> From: Jun Wu [mailto:wujun@gmail.com]
> Sent: Thursday, October 18, 2018 6:29 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.apache.org; aza...@gmail.com
> Subject: Re: Include MKLDNN into default mxnet pip package
> 
> If my understanding is correct about the context, it should be acknowledged
> that the significant performance improvement comes from the Intel
> MKLDNN team's contribution in this PR:
> https://github.com/apache/incubator-mxnet/pull/12530.
> 
> On Wed, Oct 17, 2018 at 3:12 PM kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> 
> > First of all thanks to Intel for these improvements, really a great effort.
> >
> > What would the compatibility story look like for users that don't have
> > these AVX instructions?  Would there be any negative affect for AMD users?
> >
> > Regarding TensorRT: It's a possibility but not planned in the short
> > term. A few considerations would be the limits on PyPi package sizes
> > and the bloat incurred with TRT, the requirements of TRT to be
> > installed on the user side, and the TRT engine build times which are
> > non-trivial.  We can work towards fixing or working around these
> > issues in the future if default TRT is something the user community
> > would like to see for Cuda packages.  While the feature is
> > experimental we'll likely continue to use 'mxnet-tensorrt-cu92' and
> 'mxnet-tensorrt-cu90'.
> >
> > On Wed, Oct 17, 2018 at 2:12 PM Alfredo Luque
> >  wrote:
> >
> > > This is huge. Thanks for working on this. Is there a similar plan
> > > with
> > eg;
> > > tensor-rt support being ported into the main cuda-9.x packages?
> > >
> > > On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:
> > >
> > > Hey all,
> > > We have been working hard these past few months to integrate and
> > stabilize
> > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have
> > > made incredible progress. On CPUs with AVX512 instructions (such as
> > > c5.18x) we have seen performance increase up to 12x and on other
> > > platforms (Macs,
> > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be
> > > found
> > here
> > > (
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650
> > 764
> > > and https://github.com/apache/incubator-mxnet/pull/12591).
> > >
> > > Currently, using this accelerator requires the developer to either
> > > pip install the mxnet-mkl version of mxnet or to build it themselves
> > > from source. Given that we should try to provide the best
> > > performance "out of the box” with mxnet we should include this in
> > > the default build. The
> > mkldnn
> > > library is included with in the pip package build so it does not
> > > require
> > an
> > > external dependency.
> > >
> > > There were concerns that MKLDNN could cause regressions on certain
> > > platforms (as it did with the tensorflow version a while back); but
> > > we added a env flag (MXNET_MKLDNN_ENABLED) that allows users to
> turn
> > > of this feature during runtime. Please bring up any other concerns
> > > you may have
> > and
> > > your thoughts on including this accelerator in the default build.
> > >
> > > Best,
> > > Alex
> > >
> > > —
> > > Alfredo Luque
> > > Software Engineer
> > > Machine Learning Infrastructure
> > > Airbnb
> > > San Francisco, CA
> > >
> >


Re: Include MKLDNN into default mxnet pip package

2018-10-17 Thread Jun Wu
If my understanding is correct about the context, it should be acknowledged
that the significant performance improvement comes from the Intel MKLDNN
team's contribution in this PR:
https://github.com/apache/incubator-mxnet/pull/12530.

On Wed, Oct 17, 2018 at 3:12 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> First of all thanks to Intel for these improvements, really a great effort.
>
> What would the compatibility story look like for users that don't have
> these AVX instructions?  Would there be any negative affect for AMD users?
>
> Regarding TensorRT: It's a possibility but not planned in the short term. A
> few considerations would be the limits on PyPi package sizes and the bloat
> incurred with TRT, the requirements of TRT to be installed on the user
> side, and the TRT engine build times which are non-trivial.  We can work
> towards fixing or working around these issues in the future if default TRT
> is something the user community would like to see for Cuda packages.  While
> the feature is experimental we'll likely continue to use
> 'mxnet-tensorrt-cu92' and 'mxnet-tensorrt-cu90'.
>
> On Wed, Oct 17, 2018 at 2:12 PM Alfredo Luque
>  wrote:
>
> > This is huge. Thanks for working on this. Is there a similar plan with
> eg;
> > tensor-rt support being ported into the main cuda-9.x packages?
> >
> > On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:
> >
> > Hey all,
> > We have been working hard these past few months to integrate and
> stabilize
> > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> > incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
> > have seen performance increase up to 12x and on other platforms (Macs,
> > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found
> here
> > (
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> > and https://github.com/apache/incubator-mxnet/pull/12591).
> >
> > Currently, using this accelerator requires the developer to either pip
> > install the mxnet-mkl version of mxnet or to build it themselves from
> > source. Given that we should try to provide the best performance "out of
> > the box” with mxnet we should include this in the default build. The
> mkldnn
> > library is included with in the pip package build so it does not require
> an
> > external dependency.
> >
> > There were concerns that MKLDNN could cause regressions on certain
> > platforms (as it did with the tensorflow version a while back); but we
> > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
> > feature during runtime. Please bring up any other concerns you may have
> and
> > your thoughts on including this accelerator in the default build.
> >
> > Best,
> > Alex
> >
> > —
> > Alfredo Luque
> > Software Engineer
> > Machine Learning Infrastructure
> > Airbnb
> > San Francisco, CA
> >
>


Re: Include MKLDNN into default mxnet pip package

2018-10-17 Thread kellen sunderland
First of all thanks to Intel for these improvements, really a great effort.

What would the compatibility story look like for users that don't have
these AVX instructions?  Would there be any negative affect for AMD users?

Regarding TensorRT: It's a possibility but not planned in the short term. A
few considerations would be the limits on PyPi package sizes and the bloat
incurred with TRT, the requirements of TRT to be installed on the user
side, and the TRT engine build times which are non-trivial.  We can work
towards fixing or working around these issues in the future if default TRT
is something the user community would like to see for Cuda packages.  While
the feature is experimental we'll likely continue to use
'mxnet-tensorrt-cu92' and 'mxnet-tensorrt-cu90'.

On Wed, Oct 17, 2018 at 2:12 PM Alfredo Luque
 wrote:

> This is huge. Thanks for working on this. Is there a similar plan with eg;
> tensor-rt support being ported into the main cuda-9.x packages?
>
> On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:
>
> Hey all,
> We have been working hard these past few months to integrate and stabilize
> Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
> incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
> have seen performance increase up to 12x and on other platforms (Macs,
> AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
> (
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
> and https://github.com/apache/incubator-mxnet/pull/12591).
>
> Currently, using this accelerator requires the developer to either pip
> install the mxnet-mkl version of mxnet or to build it themselves from
> source. Given that we should try to provide the best performance "out of
> the box” with mxnet we should include this in the default build. The mkldnn
> library is included with in the pip package build so it does not require an
> external dependency.
>
> There were concerns that MKLDNN could cause regressions on certain
> platforms (as it did with the tensorflow version a while back); but we
> added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
> feature during runtime. Please bring up any other concerns you may have and
> your thoughts on including this accelerator in the default build.
>
> Best,
> Alex
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA
>


Re: Include MKLDNN into default mxnet pip package

2018-10-17 Thread Alfredo Luque
This is huge. Thanks for working on this. Is there a similar plan with eg;
tensor-rt support being ported into the main cuda-9.x packages?

On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote:

Hey all,
We have been working hard these past few months to integrate and stabilize
Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
have seen performance increase up to 12x and on other platforms (Macs,
AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here
(
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
and https://github.com/apache/incubator-mxnet/pull/12591).

Currently, using this accelerator requires the developer to either pip
install the mxnet-mkl version of mxnet or to build it themselves from
source. Given that we should try to provide the best performance "out of
the box” with mxnet we should include this in the default build. The mkldnn
library is included with in the pip package build so it does not require an
external dependency.

There were concerns that MKLDNN could cause regressions on certain
platforms (as it did with the tensorflow version a while back); but we
added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
feature during runtime. Please bring up any other concerns you may have and
your thoughts on including this accelerator in the default build.

Best,
Alex

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Include MKLDNN into default mxnet pip package

2018-10-17 Thread Alex Zai
Hey all,
We have been working hard these past few months to integrate and stabilize
Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made
incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we
have seen performance increase up to 12x and on other platforms (Macs,
AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here (
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764
 and https://github.com/apache/incubator-mxnet/pull/12591).

Currently, using this accelerator requires the developer to either pip
install the mxnet-mkl version of mxnet or to build it themselves from
source. Given that we should try to provide the best performance "out of
the box” with mxnet we should include this in the default build. The mkldnn
library is included with in the pip package build so it does not require an
external dependency.

There were concerns that MKLDNN could cause regressions on certain
platforms (as it did with the tensorflow version a while back); but we
added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this
feature during runtime. Please bring up any other concerns you may have and
your thoughts on including this accelerator in the default build.

Best,
Alex