Re: Proposal to make MKLDNN as default CPU backend

2020-02-10 Thread Lausen, Leonard
Hi,

as the respective PR has been open for a while and as there has been no follow-
up to Patric's mail, I suggest to merge it once CI passes after Tao's conflict
resolution earlier today.

This gives community members time to test for regressions prior to the 1.7
release. If such were found, we can reconsider the decision.

Best regards
Leonard

[1]: https://github.com/apache/incubator-mxnet/pull/16899

On Wed, 2019-11-20 at 05:27 +, Zhao, Patric wrote:
> Thanks all of the great suggestions. 
> 
> Regarding the binary release, including w/o MKLDNN build, I have summarized a
> table (check attachment).
> 
> - Major changes in python packages, see attached table. 
> - Switch on MKLDNN for no mkl suffix binary in release 1.7 (Red check mark) 
> - Add new mxnet-native build w/o MKLDNN and cuDNN (Yellow background)
>   Track the usage/download in 1-2 releases and then decide if we need it for a
> long time
> - Drop all mkl suffix binary in next major release v2.x.
> 
> Thanks,
> 
> --Patric
> 
> > -Original Message-
> > From: Lin Yuan 
> > Sent: Wednesday, November 20, 2019 5:40 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Tao Lv 
> > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > 
> > Also per Sam's suggestion, we could still release a build without MKLDNN
> > (name it mxnet-nomkldnn?) and track the usage/download for one or two
> > releases. If there is no usage, we could drop that build in the future.
> > 
> > Best,
> > 
> > Lin
> > 
> > On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:
> > 
> > > Just to summarize base on the concerns Marco raised and discussed
> > abvove:
> > > - AMD CPU (it should work with MKLDNN:
> > > 
> > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+M
> > KL
> > > -DNN+-+Performance+Benchmarking
> > > )
> > > - ARM CPU (we don't have it today w/o MKLDNN either)
> > > - Windows (Windows support is there regardless of MKLDNN or not)
> > > - GPU and MKLDNN enabled (already supported)
> > > - Fully reproducible results (medical and financial sector requested
> > > that and we have some flags for cuda) (The nondeterminism exists even
> > > today w/o MKLDNN. We should address it regardless of MLKDNN)
> > > 
> > > Marco, please let us know if your concerns are properly addressed?
> > > 
> > > Given that MKLDNN gives significant performance speed up in CPU, I am
> > > inclined to make it default in pip build.
> > > 
> > > Best,
> > > 
> > > Lin
> > > 
> > > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> > > wrote:
> > > 
> > > > Thanks, Patric. I was just trying to point out that there was
> > > > currently no guarantee of deterministic results without MKL, so
> > > > there’s not necessarily an expectation of determinism with MKL (ie
> > requirement isn’t relaxed).
> > > > On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> > > > wrote:
> > > > 
> > > > > It may be a concern but little noise can't affect the final results
> > > > > if
> > > > the
> > > > > algorithm is stable in numerical.
> > > > > The MKLDNN backend with mxnet-mkl has been used for 2 years and
> > we
> > > > didn't
> > > > > see the coverage issue caused by multiple threading.
> > > > > In other words, GPU programming mode works well on training where
> > > > > the non-deterministic also exists from multiple threads.
> > > > > 
> > > > > Parts of training accuracy was pasted in the first PR when MKLDNN
> > > > > is integrated.
> > > > > 
> > > > https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-
> > 3596
> > > > 74818
> > > > > In conclusion, it may happen with very little probability. I
> > > > > believe we can get a solution in case it happens someday.
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > --Patric
> > > > > 
> > > > > 
> > > > > > -Original Message-
> > > > > > From: Chris Olivier 
> > > > > > Sent: Tuesday, November 19, 2019 11:51 AM
> > > > > > To: dev@mxnet.incubator.apache.org
> > > > > > Cc: Tao Lv 
> > > > > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > > > > > 
> > > > >

RE: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Zhao, Patric
Thanks all of the great suggestions. 

Regarding the binary release, including w/o MKLDNN build, I have summarized a 
table (check attachment).

- Major changes in python packages, see attached table. 
- Switch on MKLDNN for no mkl suffix binary in release 1.7 (Red check mark) 
- Add new mxnet-native build w/o MKLDNN and cuDNN (Yellow background)
  Track the usage/download in 1-2 releases and then decide if we need it for a 
long time
- Drop all mkl suffix binary in next major release v2.x.

Thanks,

--Patric

> -Original Message-
> From: Lin Yuan 
> Sent: Wednesday, November 20, 2019 5:40 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Tao Lv 
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> Also per Sam's suggestion, we could still release a build without MKLDNN
> (name it mxnet-nomkldnn?) and track the usage/download for one or two
> releases. If there is no usage, we could drop that build in the future.
> 
> Best,
> 
> Lin
> 
> On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:
> 
> > Just to summarize base on the concerns Marco raised and discussed
> abvove:
> >
> > - AMD CPU (it should work with MKLDNN:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+M
> KL
> > -DNN+-+Performance+Benchmarking
> > )
> > - ARM CPU (we don't have it today w/o MKLDNN either)
> > - Windows (Windows support is there regardless of MKLDNN or not)
> > - GPU and MKLDNN enabled (already supported)
> > - Fully reproducible results (medical and financial sector requested
> > that and we have some flags for cuda) (The nondeterminism exists even
> > today w/o MKLDNN. We should address it regardless of MLKDNN)
> >
> > Marco, please let us know if your concerns are properly addressed?
> >
> > Given that MKLDNN gives significant performance speed up in CPU, I am
> > inclined to make it default in pip build.
> >
> > Best,
> >
> > Lin
> >
> > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> > wrote:
> >
> >> Thanks, Patric. I was just trying to point out that there was
> >> currently no guarantee of deterministic results without MKL, so
> >> there’s not necessarily an expectation of determinism with MKL (ie
> requirement isn’t relaxed).
> >>
> >> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> >> wrote:
> >>
> >> > It may be a concern but little noise can't affect the final results
> >> > if
> >> the
> >> > algorithm is stable in numerical.
> >> > The MKLDNN backend with mxnet-mkl has been used for 2 years and
> we
> >> didn't
> >> > see the coverage issue caused by multiple threading.
> >> > In other words, GPU programming mode works well on training where
> >> > the non-deterministic also exists from multiple threads.
> >> >
> >> > Parts of training accuracy was pasted in the first PR when MKLDNN
> >> > is integrated.
> >> >
> >> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-
> 3596
> >> 74818
> >> >
> >> > In conclusion, it may happen with very little probability. I
> >> > believe we can get a solution in case it happens someday.
> >> >
> >> > Thanks,
> >> >
> >> > --Patric
> >> >
> >> >
> >> > > -Original Message-
> >> > > From: Chris Olivier 
> >> > > Sent: Tuesday, November 19, 2019 11:51 AM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Cc: Tao Lv 
> >> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >> > >
> >> > > (for non mkl dropout, for instance)
> >> > >
> >> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier
> >> > > 
> >> > > wrote:
> >> > >
> >> > > > To address the deterministic item, I know for a fact that
> >> > > > training will not be deterministic in some cases where the “parallel
> random”
> >> > > > class is utilized in parallel threads, such as OMP, if the
> >> > > > number of cores is different, even with the same seed, because
> >> > > > threads are seeded independently and different number of
> >> > > > threads will end up generating different random number
> >> > > > sequences. Dropout operator being
> >> > > an example.
> >> > > >
> >> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
>

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Lin Yuan
Also per Sam's suggestion, we could still release a build without MKLDNN
(name it mxnet-nomkldnn?) and track the usage/download for one or two
releases. If there is no usage, we could drop that build in the future.

Best,

Lin

On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:

> Just to summarize base on the concerns Marco raised and discussed abvove:
>
> - AMD CPU (it should work with MKLDNN:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> )
> - ARM CPU (we don't have it today w/o MKLDNN either)
> - Windows (Windows support is there regardless of MKLDNN or not)
> - GPU and MKLDNN enabled (already supported)
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda) (The nondeterminism exists even today w/o
> MKLDNN. We should address it regardless of MLKDNN)
>
> Marco, please let us know if your concerns are properly addressed?
>
> Given that MKLDNN gives significant performance speed up in CPU, I am
> inclined to make it default in pip build.
>
> Best,
>
> Lin
>
> On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> wrote:
>
>> Thanks, Patric. I was just trying to point out that there was currently no
>> guarantee of deterministic results without MKL, so there’s not necessarily
>> an expectation of determinism with MKL (ie requirement isn’t relaxed).
>>
>> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
>> wrote:
>>
>> > It may be a concern but little noise can't affect the final results if
>> the
>> > algorithm is stable in numerical.
>> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we
>> didn't
>> > see the coverage issue caused by multiple threading.
>> > In other words, GPU programming mode works well on training where the
>> > non-deterministic also exists from multiple threads.
>> >
>> > Parts of training accuracy was pasted in the first PR when MKLDNN is
>> > integrated.
>> >
>> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
>> >
>> > In conclusion, it may happen with very little probability. I believe we
>> > can get a solution in case it happens someday.
>> >
>> > Thanks,
>> >
>> > --Patric
>> >
>> >
>> > > -Original Message-
>> > > From: Chris Olivier 
>> > > Sent: Tuesday, November 19, 2019 11:51 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Cc: Tao Lv 
>> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
>> > >
>> > > (for non mkl dropout, for instance)
>> > >
>> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
>> > > wrote:
>> > >
>> > > > To address the deterministic item, I know for a fact that training
>> > > > will not be deterministic in some cases where the “parallel random”
>> > > > class is utilized in parallel threads, such as OMP, if the number of
>> > > > cores is different, even with the same seed, because threads are
>> > > > seeded independently and different number of threads will end up
>> > > > generating different random number sequences. Dropout operator being
>> > > an example.
>> > > >
>> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
>> > > >  wrote:
>> > > >
>> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
>> > > >> would be enabled by default. Historically, other intel libraries
>> > > >> (along with the ICC
>> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth
>> > > >> double checking to make sure that’s not the case here. Perhaps some
>> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double
>> > > >> check that an
>> > > >> AVX2 package passes tests.
>> > > >>
>> > > >> Agreed in the case we’re not releasing ARM binaries.
>> > > >>
>> > > >> The reproducibility argument is around the results being
>> numerically
>> > > >> reproducible. That is, eg; if I train a model with some fixed set
>> of
>> > > >> data, some random seed, etc. and then run inference on it do I get
>> > > >> the exact same floating point values for the weights and results?
>> > > >> Does MxNet already offer this without MKL-DNN?
>> > > 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Lin Yuan
Just to summarize base on the concerns Marco raised and discussed abvove:

- AMD CPU (it should work with MKLDNN:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
)
- ARM CPU (we don't have it today w/o MKLDNN either)
- Windows (Windows support is there regardless of MKLDNN or not)
- GPU and MKLDNN enabled (already supported)
- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda) (The nondeterminism exists even today w/o
MKLDNN. We should address it regardless of MLKDNN)

Marco, please let us know if your concerns are properly addressed?

Given that MKLDNN gives significant performance speed up in CPU, I am
inclined to make it default in pip build.

Best,

Lin

On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier  wrote:

> Thanks, Patric. I was just trying to point out that there was currently no
> guarantee of deterministic results without MKL, so there’s not necessarily
> an expectation of determinism with MKL (ie requirement isn’t relaxed).
>
> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> wrote:
>
> > It may be a concern but little noise can't affect the final results if
> the
> > algorithm is stable in numerical.
> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't
> > see the coverage issue caused by multiple threading.
> > In other words, GPU programming mode works well on training where the
> > non-deterministic also exists from multiple threads.
> >
> > Parts of training accuracy was pasted in the first PR when MKLDNN is
> > integrated.
> >
> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
> >
> > In conclusion, it may happen with very little probability. I believe we
> > can get a solution in case it happens someday.
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: Chris Olivier 
> > > Sent: Tuesday, November 19, 2019 11:51 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: Tao Lv 
> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > >
> > > (for non mkl dropout, for instance)
> > >
> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
> > > wrote:
> > >
> > > > To address the deterministic item, I know for a fact that training
> > > > will not be deterministic in some cases where the “parallel random”
> > > > class is utilized in parallel threads, such as OMP, if the number of
> > > > cores is different, even with the same seed, because threads are
> > > > seeded independently and different number of threads will end up
> > > > generating different random number sequences. Dropout operator being
> > > an example.
> > > >
> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > > >  wrote:
> > > >
> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> > > >> would be enabled by default. Historically, other intel libraries
> > > >> (along with the ICC
> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth
> > > >> double checking to make sure that’s not the case here. Perhaps some
> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double
> > > >> check that an
> > > >> AVX2 package passes tests.
> > > >>
> > > >> Agreed in the case we’re not releasing ARM binaries.
> > > >>
> > > >> The reproducibility argument is around the results being numerically
> > > >> reproducible. That is, eg; if I train a model with some fixed set of
> > > >> data, some random seed, etc. and then run inference on it do I get
> > > >> the exact same floating point values for the weights and results?
> > > >> Does MxNet already offer this without MKL-DNN?
> > > >>
> > > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
> > > wrote:
> > > >>
> > > >> Regarding the cases listed by Marco:
> > > >> - AMD CPU
> > > >> From my architecture knowledge, what works on C4 instances (with
> AVX2
> > > >> support) should also work well on m5a, right? I think mxnet-mkl and
> > > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> > > >> Also, we didn't perform any validation on AMD CPU before, why we
> need
> > > >> do that for this time?
> > > >>

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Chris Olivier
Thanks, Patric. I was just trying to point out that there was currently no
guarantee of deterministic results without MKL, so there’s not necessarily
an expectation of determinism with MKL (ie requirement isn’t relaxed).

On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric  wrote:

> It may be a concern but little noise can't affect the final results if the
> algorithm is stable in numerical.
> The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't
> see the coverage issue caused by multiple threading.
> In other words, GPU programming mode works well on training where the
> non-deterministic also exists from multiple threads.
>
> Parts of training accuracy was pasted in the first PR when MKLDNN is
> integrated.
> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
>
> In conclusion, it may happen with very little probability. I believe we
> can get a solution in case it happens someday.
>
> Thanks,
>
> --Patric
>
>
> > -Original Message-
> > From: Chris Olivier 
> > Sent: Tuesday, November 19, 2019 11:51 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Tao Lv 
> > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >
> > (for non mkl dropout, for instance)
> >
> > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
> > wrote:
> >
> > > To address the deterministic item, I know for a fact that training
> > > will not be deterministic in some cases where the “parallel random”
> > > class is utilized in parallel threads, such as OMP, if the number of
> > > cores is different, even with the same seed, because threads are
> > > seeded independently and different number of threads will end up
> > > generating different random number sequences. Dropout operator being
> > an example.
> > >
> > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > >  wrote:
> > >
> > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> > >> would be enabled by default. Historically, other intel libraries
> > >> (along with the ICC
> > >> compiler) have had performance issues on AMD CPUs. It’s just worth
> > >> double checking to make sure that’s not the case here. Perhaps some
> > >> MKL-DNN authors can chime in though. It’s not sufficient to double
> > >> check that an
> > >> AVX2 package passes tests.
> > >>
> > >> Agreed in the case we’re not releasing ARM binaries.
> > >>
> > >> The reproducibility argument is around the results being numerically
> > >> reproducible. That is, eg; if I train a model with some fixed set of
> > >> data, some random seed, etc. and then run inference on it do I get
> > >> the exact same floating point values for the weights and results?
> > >> Does MxNet already offer this without MKL-DNN?
> > >>
> > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
> > wrote:
> > >>
> > >> Regarding the cases listed by Marco:
> > >> - AMD CPU
> > >> From my architecture knowledge, what works on C4 instances (with AVX2
> > >> support) should also work well on m5a, right? I think mxnet-mkl and
> > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> > >> Also, we didn't perform any validation on AMD CPU before, why we need
> > >> do that for this time?
> > >>
> > >> - ARM CPU
> > >> I don't know we're releasing any convenience binaries for ARM CPU.
> > >> This proposal mainly targets those pypi packages.
> > >>
> > >> - Windows
> > >> Already validated by CI. We're also releasing mxnet-mkl packages for
> Win.
> > >>
> > >> - GPU and MKLDNN enabled
> > >> Already validated by CI and mxnet-cuxxmkl packages have been released
> > >> for several versions.
> > >>
> > >> - Fully reproducible results (medical and financial sector requested
> > >> that and we have some flags for cuda) Not sure I understand this
> > >> case. We already have MKL-DNN backend for a while. Functionality and
> > >> correctness of it have been verified by MXNet users.
> > >>
> > >> -tao
> > >>
> > >> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> > >> 
> > >> wrote:
> > >>
> > >> > Sorry, my intent with the "non-standard" phrase was not about
> > >> > general
> > >> MXNet
> > >

RE: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Zhao, Patric
It may be a concern but little noise can't affect the final results if the 
algorithm is stable in numerical.  
The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't see 
the coverage issue caused by multiple threading.
In other words, GPU programming mode works well on training where the 
non-deterministic also exists from multiple threads.

Parts of training accuracy was pasted in the first PR when MKLDNN is integrated.
https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818

In conclusion, it may happen with very little probability. I believe we can get 
a solution in case it happens someday.

Thanks,

--Patric


> -Original Message-
> From: Chris Olivier 
> Sent: Tuesday, November 19, 2019 11:51 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Tao Lv 
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> (for non mkl dropout, for instance)
> 
> On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
> wrote:
> 
> > To address the deterministic item, I know for a fact that training
> > will not be deterministic in some cases where the “parallel random”
> > class is utilized in parallel threads, such as OMP, if the number of
> > cores is different, even with the same seed, because threads are
> > seeded independently and different number of threads will end up
> > generating different random number sequences. Dropout operator being
> an example.
> >
> > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> >  wrote:
> >
> >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> >> would be enabled by default. Historically, other intel libraries
> >> (along with the ICC
> >> compiler) have had performance issues on AMD CPUs. It’s just worth
> >> double checking to make sure that’s not the case here. Perhaps some
> >> MKL-DNN authors can chime in though. It’s not sufficient to double
> >> check that an
> >> AVX2 package passes tests.
> >>
> >> Agreed in the case we’re not releasing ARM binaries.
> >>
> >> The reproducibility argument is around the results being numerically
> >> reproducible. That is, eg; if I train a model with some fixed set of
> >> data, some random seed, etc. and then run inference on it do I get
> >> the exact same floating point values for the weights and results?
> >> Does MxNet already offer this without MKL-DNN?
> >>
> >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
> wrote:
> >>
> >> Regarding the cases listed by Marco:
> >> - AMD CPU
> >> From my architecture knowledge, what works on C4 instances (with AVX2
> >> support) should also work well on m5a, right? I think mxnet-mkl and
> >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> >> Also, we didn't perform any validation on AMD CPU before, why we need
> >> do that for this time?
> >>
> >> - ARM CPU
> >> I don't know we're releasing any convenience binaries for ARM CPU.
> >> This proposal mainly targets those pypi packages.
> >>
> >> - Windows
> >> Already validated by CI. We're also releasing mxnet-mkl packages for Win.
> >>
> >> - GPU and MKLDNN enabled
> >> Already validated by CI and mxnet-cuxxmkl packages have been released
> >> for several versions.
> >>
> >> - Fully reproducible results (medical and financial sector requested
> >> that and we have some flags for cuda) Not sure I understand this
> >> case. We already have MKL-DNN backend for a while. Functionality and
> >> correctness of it have been verified by MXNet users.
> >>
> >> -tao
> >>
> >> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> >> 
> >> wrote:
> >>
> >> > Sorry, my intent with the "non-standard" phrase was not about
> >> > general
> >> MXNet
> >> > but rather from MKLDNNs point of view, considering that it's being
> >> > developed by Intel, I assumed that MKLDNN might consider non-intel
> >> > use-cases non standard.
> >> >
> >> > -Marco
> >> >
> >> > Skalicky, Sam  schrieb am Mo., 18. Nov.
> >> 2019,
> >> > 21:34:
> >> >
> >> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps
> >> > > we
> >> can
> >> > > add this to the todo list for integrating with the MXNet CI to
> >> > > test on
> >> > m5a
> >> > > instances too. 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Chris Olivier
(for non mkl dropout, for instance)

On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier  wrote:

> To address the deterministic item, I know for a fact that training will
> not be deterministic in some cases where the “parallel random” class is
> utilized in parallel threads, such as OMP, if the number of cores is
> different, even with the same seed, because threads are seeded
> independently and different number of threads will end up generating
> different random number sequences. Dropout operator being an example.
>
> On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
>  wrote:
>
>> For AMD CPUs, you’d want to perform validation because now MKL-DNN would
>> be
>> enabled by default. Historically, other intel libraries (along with the
>> ICC
>> compiler) have had performance issues on AMD CPUs. It’s just worth double
>> checking to make sure that’s not the case here. Perhaps some MKL-DNN
>> authors can chime in though. It’s not sufficient to double check that an
>> AVX2 package passes tests.
>>
>> Agreed in the case we’re not releasing ARM binaries.
>>
>> The reproducibility argument is around the results being numerically
>> reproducible. That is, eg; if I train a model with some fixed set of data,
>> some random seed, etc. and then run inference on it do I get the exact
>> same
>> floating point values for the weights and results? Does MxNet already
>> offer
>> this without MKL-DNN?
>>
>> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) wrote:
>>
>> Regarding the cases listed by Marco:
>> - AMD CPU
>> From my architecture knowledge, what works on C4 instances (with AVX2
>> support) should also work well on m5a, right? I think mxnet-mkl and
>> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
>> Also, we didn't perform any validation on AMD CPU before, why we need do
>> that for this time?
>>
>> - ARM CPU
>> I don't know we're releasing any convenience binaries for ARM CPU. This
>> proposal mainly targets those pypi packages.
>>
>> - Windows
>> Already validated by CI. We're also releasing mxnet-mkl packages for Win.
>>
>> - GPU and MKLDNN enabled
>> Already validated by CI and mxnet-cuxxmkl packages have been released for
>> several versions.
>>
>> - Fully reproducible results (medical and financial sector requested that
>> and we have some flags for cuda)
>> Not sure I understand this case. We already have MKL-DNN backend for a
>> while. Functionality and correctness of it have been verified by MXNet
>> users.
>>
>> -tao
>>
>> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu 
>> wrote:
>>
>> > Sorry, my intent with the "non-standard" phrase was not about general
>> MXNet
>> > but rather from MKLDNNs point of view, considering that it's being
>> > developed by Intel, I assumed that MKLDNN might consider non-intel
>> > use-cases non standard.
>> >
>> > -Marco
>> >
>> > Skalicky, Sam  schrieb am Mo., 18. Nov.
>> 2019,
>> > 21:34:
>> >
>> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
>> can
>> > > add this to the todo list for integrating with the MXNet CI to test on
>> > m5a
>> > > instances too. Then we can start tracking this on a regular basis. It
>> > would
>> > > be great to actually test on ARM instances now that AWS has A1
>> instances
>> > > too…..ill add it to the wish list ;-D
>> > >
>> > > Sam
>> > >
>> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <
>> alfredo.lu...@airbnb.com
>> > .INVALID>
>> > > wrote:
>> > > >
>> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
>> > > > generation AMD Threadripper Gen 1 if someone has something easy to
>> run
>> > > and
>> > > > representative.
>> > > >
>> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
>> > > > sska...@amazon.com.invalid) wrote:
>> > > >
>> > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs?
>> Or
>> > is
>> > > > there someone else in the mxnet dev@ community who can help?
>> > > >
>> > > > Sam
>> > > >
>> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
>> > > >  wrote:
>> > > >>
>> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen /
>> Epyc)
>> > > > would
>> > > >> definitely make sense as a requirement. It seems odd to classify
>> that
>> > as
>> > > > a
>> > > >> “nonstandard” use case.
>> > > >>
>> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
>> > > >> sska...@amazon.com.invalid) wrote:
>> > > >>
>> > > >> Thanks Patric & team for your work over the years to make MXNet
>> fast
>> > > with
>> > > >> MKLDNN!
>> > > >>
>> > > >> I think it would be great to make MKLDNN enabled by default. We
>> will
>> > > need
>> > > >> to continue producing variants without MKLDNN for those who don’t
>> want
>> > > it
>> > > >> (Marco enumerated some use cases). How do you propose to identify
>> the
>> > > pip
>> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
>> > > > mxnet-cu101mkl
>> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN
>> what
>> > do
>> > > > you

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Chris Olivier
To address the deterministic item, I know for a fact that training will not
be deterministic in some cases where the “parallel random” class is
utilized in parallel threads, such as OMP, if the number of cores is
different, even with the same seed, because threads are seeded
independently and different number of threads will end up generating
different random number sequences. Dropout operator being an example.

On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
 wrote:

> For AMD CPUs, you’d want to perform validation because now MKL-DNN would be
> enabled by default. Historically, other intel libraries (along with the ICC
> compiler) have had performance issues on AMD CPUs. It’s just worth double
> checking to make sure that’s not the case here. Perhaps some MKL-DNN
> authors can chime in though. It’s not sufficient to double check that an
> AVX2 package passes tests.
>
> Agreed in the case we’re not releasing ARM binaries.
>
> The reproducibility argument is around the results being numerically
> reproducible. That is, eg; if I train a model with some fixed set of data,
> some random seed, etc. and then run inference on it do I get the exact same
> floating point values for the weights and results? Does MxNet already offer
> this without MKL-DNN?
>
> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) wrote:
>
> Regarding the cases listed by Marco:
> - AMD CPU
> From my architecture knowledge, what works on C4 instances (with AVX2
> support) should also work well on m5a, right? I think mxnet-mkl and
> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> Also, we didn't perform any validation on AMD CPU before, why we need do
> that for this time?
>
> - ARM CPU
> I don't know we're releasing any convenience binaries for ARM CPU. This
> proposal mainly targets those pypi packages.
>
> - Windows
> Already validated by CI. We're also releasing mxnet-mkl packages for Win.
>
> - GPU and MKLDNN enabled
> Already validated by CI and mxnet-cuxxmkl packages have been released for
> several versions.
>
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda)
> Not sure I understand this case. We already have MKL-DNN backend for a
> while. Functionality and correctness of it have been verified by MXNet
> users.
>
> -tao
>
> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu 
> wrote:
>
> > Sorry, my intent with the "non-standard" phrase was not about general
> MXNet
> > but rather from MKLDNNs point of view, considering that it's being
> > developed by Intel, I assumed that MKLDNN might consider non-intel
> > use-cases non standard.
> >
> > -Marco
> >
> > Skalicky, Sam  schrieb am Mo., 18. Nov.
> 2019,
> > 21:34:
> >
> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
> can
> > > add this to the todo list for integrating with the MXNet CI to test on
> > m5a
> > > instances too. Then we can start tracking this on a regular basis. It
> > would
> > > be great to actually test on ARM instances now that AWS has A1
> instances
> > > too…..ill add it to the wish list ;-D
> > >
> > > Sam
> > >
> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <
> alfredo.lu...@airbnb.com
> > .INVALID>
> > > wrote:
> > > >
> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
> > > > generation AMD Threadripper Gen 1 if someone has something easy to
> run
> > > and
> > > > representative.
> > > >
> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > > > sska...@amazon.com.invalid) wrote:
> > > >
> > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or
> > is
> > > > there someone else in the mxnet dev@ community who can help?
> > > >
> > > > Sam
> > > >
> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > > >  wrote:
> > > >>
> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
> > > > would
> > > >> definitely make sense as a requirement. It seems odd to classify
> that
> > as
> > > > a
> > > >> “nonstandard” use case.
> > > >>
> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > > >> sska...@amazon.com.invalid) wrote:
> > > >>
> > > >> Thanks Patric & team for your work over the years to make MXNet fast
> > > with
> > > >> MKLDNN!
> > > >>
> > > >> I think it would be great to make MKLDNN enabled by default. We will
> > > need
> > > >> to continue producing variants without MKLDNN for those who don’t
> want
> > > it
> > > >> (Marco enumerated some use cases). How do you propose to identify
> the
> > > pip
> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > > > mxnet-cu101mkl
> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what
> > do
> > > > you
> > > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > > >>
> > > >> Thanks!
> > > >> Sam
> > > >>
> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> > marco.g.ab...@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Hi Patric,
> > > >>>
> > 

RE: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Zhao, Patric
Thanks for all great inputs. 

Regarding AMD, these're some data in the wiki.
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking


> -Original Message-
> From: Alfredo Luque 
> Sent: Tuesday, November 19, 2019 10:40 AM
> To: Tao Lv ; dev@mxnet.incubator.apache.org
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> would be enabled by default. Historically, other intel libraries (along with 
> the
> ICC
> compiler) have had performance issues on AMD CPUs. It’s just worth double
> checking to make sure that’s not the case here. Perhaps some MKL-DNN
> authors can chime in though. It’s not sufficient to double check that an
> AVX2 package passes tests.
> 
> Agreed in the case we’re not releasing ARM binaries.
> 
> The reproducibility argument is around the results being numerically
> reproducible. That is, eg; if I train a model with some fixed set of data, 
> some
> random seed, etc. and then run inference on it do I get the exact same
> floating point values for the weights and results? Does MxNet already offer
> this without MKL-DNN?
> 
> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) wrote:
> 
> Regarding the cases listed by Marco:
> - AMD CPU
> From my architecture knowledge, what works on C4 instances (with AVX2
> support) should also work well on m5a, right? I think mxnet-mkl and mxnet-
> cuxxmkl packages have been fully validated on AVX2 machines.
> Also, we didn't perform any validation on AMD CPU before, why we need do
> that for this time?
> 
> - ARM CPU
> I don't know we're releasing any convenience binaries for ARM CPU. This
> proposal mainly targets those pypi packages.
> 
> - Windows
> Already validated by CI. We're also releasing mxnet-mkl packages for Win.
> 
> - GPU and MKLDNN enabled
> Already validated by CI and mxnet-cuxxmkl packages have been released for
> several versions.
> 
> - Fully reproducible results (medical and financial sector requested that and
> we have some flags for cuda) Not sure I understand this case. We already
> have MKL-DNN backend for a while. Functionality and correctness of it have
> been verified by MXNet users.
> 
> -tao
> 
> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> 
> wrote:
> 
> > Sorry, my intent with the "non-standard" phrase was not about general
> MXNet
> > but rather from MKLDNNs point of view, considering that it's being
> > developed by Intel, I assumed that MKLDNN might consider non-intel
> > use-cases non standard.
> >
> > -Marco
> >
> > Skalicky, Sam  schrieb am Mo., 18. Nov.
> > 2019,
> > 21:34:
> >
> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
> can
> > > add this to the todo list for integrating with the MXNet CI to test
> > > on
> > m5a
> > > instances too. Then we can start tracking this on a regular basis.
> > > It
> > would
> > > be great to actually test on ARM instances now that AWS has A1
> instances
> > > too…..ill add it to the wish list ;-D
> > >
> > > Sam
> > >
> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque
> > > >  > .INVALID>
> > > wrote:
> > > >
> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and
> > > > first generation AMD Threadripper Gen 1 if someone has something
> > > > easy to
> run
> > > and
> > > > representative.
> > > >
> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > > > sska...@amazon.com.invalid) wrote:
> > > >
> > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs?
> > > > Or
> > is
> > > > there someone else in the mxnet dev@ community who can help?
> > > >
> > > > Sam
> > > >
> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > > >  wrote:
> > > >>
> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen /
> > > >> Epyc)
> > > > would
> > > >> definitely make sense as a requirement. It seems odd to classify
> that
> > as
> > > > a
> > > >> “nonstandard” use case.
> > > >>
> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > > >> sska...@amazon.com.invalid) wrote:
> > > >>
> > > >> Thanks Patric & team for your work over the years to make MXNet
&

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Alfredo Luque
For AMD CPUs, you’d want to perform validation because now MKL-DNN would be
enabled by default. Historically, other intel libraries (along with the ICC
compiler) have had performance issues on AMD CPUs. It’s just worth double
checking to make sure that’s not the case here. Perhaps some MKL-DNN
authors can chime in though. It’s not sufficient to double check that an
AVX2 package passes tests.

Agreed in the case we’re not releasing ARM binaries.

The reproducibility argument is around the results being numerically
reproducible. That is, eg; if I train a model with some fixed set of data,
some random seed, etc. and then run inference on it do I get the exact same
floating point values for the weights and results? Does MxNet already offer
this without MKL-DNN?

On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) wrote:

Regarding the cases listed by Marco:
- AMD CPU
>From my architecture knowledge, what works on C4 instances (with AVX2
support) should also work well on m5a, right? I think mxnet-mkl and
mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
Also, we didn't perform any validation on AMD CPU before, why we need do
that for this time?

- ARM CPU
I don't know we're releasing any convenience binaries for ARM CPU. This
proposal mainly targets those pypi packages.

- Windows
Already validated by CI. We're also releasing mxnet-mkl packages for Win.

- GPU and MKLDNN enabled
Already validated by CI and mxnet-cuxxmkl packages have been released for
several versions.

- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda)
Not sure I understand this case. We already have MKL-DNN backend for a
while. Functionality and correctness of it have been verified by MXNet
users.

-tao

On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu 
wrote:

> Sorry, my intent with the "non-standard" phrase was not about general
MXNet
> but rather from MKLDNNs point of view, considering that it's being
> developed by Intel, I assumed that MKLDNN might consider non-intel
> use-cases non standard.
>
> -Marco
>
> Skalicky, Sam  schrieb am Mo., 18. Nov. 2019,
> 21:34:
>
> > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
can
> > add this to the todo list for integrating with the MXNet CI to test on
> m5a
> > instances too. Then we can start tracking this on a regular basis. It
> would
> > be great to actually test on ARM instances now that AWS has A1
instances
> > too…..ill add it to the wish list ;-D
> >
> > Sam
> >
> > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque  .INVALID>
> > wrote:
> > >
> > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
> > > generation AMD Threadripper Gen 1 if someone has something easy to
run
> > and
> > > representative.
> > >
> > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > > sska...@amazon.com.invalid) wrote:
> > >
> > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or
> is
> > > there someone else in the mxnet dev@ community who can help?
> > >
> > > Sam
> > >
> > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > >  wrote:
> > >>
> > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
> > > would
> > >> definitely make sense as a requirement. It seems odd to classify
that
> as
> > > a
> > >> “nonstandard” use case.
> > >>
> > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > >> sska...@amazon.com.invalid) wrote:
> > >>
> > >> Thanks Patric & team for your work over the years to make MXNet fast
> > with
> > >> MKLDNN!
> > >>
> > >> I think it would be great to make MKLDNN enabled by default. We will
> > need
> > >> to continue producing variants without MKLDNN for those who don’t
want
> > it
> > >> (Marco enumerated some use cases). How do you propose to identify
the
> > pip
> > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > > mxnet-cu101mkl
> > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what
> do
> > > you
> > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > >>
> > >> Thanks!
> > >> Sam
> > >>
> > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> marco.g.ab...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hi Patric,
> > >>>
> > >>> First of all, thanks a lot to you and your team for all the effort
on
> > >> MXNet
> > >>> and mkldnn!
> > >>>
> > >>> Generally I'm inclined towards your proposal, but I'm thinking
about
> > the
> > >>> non-standard use cases:
> > >>> - AMD CPU
> > >>> - ARM CPU
> > >>> - Windows
> > >>> - GPU and MKLDNN enabled
> > >>> - Fully reproducible results (medical and financial sector
requested
> > > that
> > >>> and we have some flags for cuda)
> > >>>
> > >>> Is mkldnn fully compatible with these use cases? If not, what would
> > >> happen?
> > >>> If yes, do we have performance numbers?
> > >>>
> > >>> Best regards,
> > >>> Marco
> > >>>
> > >>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
> > >> 14:00:
> > >>>
> >  Hi MXNet community,
> 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Tao Lv
Regarding the cases listed by Marco:
- AMD CPU
>From my architecture knowledge, what works on C4 instances (with AVX2
support) should also work well on m5a, right? I think mxnet-mkl and
mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
Also, we didn't perform any validation on AMD CPU before, why we need do
that for this time?

- ARM CPU
I don't know we're releasing any convenience binaries for ARM CPU. This
proposal mainly targets those pypi packages.

- Windows
Already validated by CI. We're also releasing mxnet-mkl packages for Win.

- GPU and MKLDNN enabled
Already validated by CI and mxnet-cuxxmkl packages have been released for
several versions.

- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda)
Not sure I understand this case. We already have MKL-DNN backend for a
while. Functionality and correctness of it have been verified by MXNet
users.

-tao

On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu 
wrote:

> Sorry, my intent with the "non-standard" phrase was not about general MXNet
> but rather from MKLDNNs point of view, considering that it's being
> developed by Intel, I assumed that MKLDNN might consider non-intel
> use-cases non standard.
>
> -Marco
>
> Skalicky, Sam  schrieb am Mo., 18. Nov. 2019,
> 21:34:
>
> > Thanks Alfredo, if you can create a GitHub issue with notes/steps we can
> > add this to the todo list for integrating with the MXNet CI to test on
> m5a
> > instances too. Then we can start tracking this on a regular basis. It
> would
> > be great to actually test on ARM instances now that AWS has A1 instances
> > too…..ill add it to the wish list ;-D
> >
> > Sam
> >
> > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque  .INVALID>
> > wrote:
> > >
> > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
> > > generation AMD Threadripper Gen 1 if someone has something easy to run
> > and
> > > representative.
> > >
> > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > > sska...@amazon.com.invalid) wrote:
> > >
> > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or
> is
> > > there someone else in the mxnet dev@ community who can help?
> > >
> > > Sam
> > >
> > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > >  wrote:
> > >>
> > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
> > > would
> > >> definitely make sense as a requirement. It seems odd to classify that
> as
> > > a
> > >> “nonstandard” use case.
> > >>
> > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > >> sska...@amazon.com.invalid) wrote:
> > >>
> > >> Thanks Patric & team for your work over the years to make MXNet fast
> > with
> > >> MKLDNN!
> > >>
> > >> I think it would be great to make MKLDNN enabled by default. We will
> > need
> > >> to continue producing variants without MKLDNN for those who don’t want
> > it
> > >> (Marco enumerated some use cases). How do you propose to identify the
> > pip
> > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > > mxnet-cu101mkl
> > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what
> do
> > > you
> > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > >>
> > >> Thanks!
> > >> Sam
> > >>
> > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> marco.g.ab...@gmail.com>
> > >> wrote:
> > >>>
> > >>> Hi Patric,
> > >>>
> > >>> First of all, thanks a lot to you and your team for all the effort on
> > >> MXNet
> > >>> and mkldnn!
> > >>>
> > >>> Generally I'm inclined towards your proposal, but I'm thinking about
> > the
> > >>> non-standard use cases:
> > >>> - AMD CPU
> > >>> - ARM CPU
> > >>> - Windows
> > >>> - GPU and MKLDNN enabled
> > >>> - Fully reproducible results (medical and financial sector requested
> > > that
> > >>> and we have some flags for cuda)
> > >>>
> > >>> Is mkldnn fully compatible with these use cases? If not, what would
> > >> happen?
> > >>> If yes, do we have performance numbers?
> > >>>
> > >>> Best regards,
> > >>> Marco
> > >>>
> > >>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
> > >> 14:00:
> > >>>
> >  Hi MXNet community,
> > 
> >  From the first MKLDNN backend integrated in release 1.2, the
> community
> > >> is
> >  continuously improving the quality and performance of MKLDNN CPU
> > >> backend.
> >  Nowadays, the MKLDNN backend is widely used for the inference,
> > >> especially
> >  for INT8 inference, and we got lots of very positive feedbacks from
> > >> MXNet
> >  users.
> > 
> >  Achieved milestones as below:
> > 
> >  - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018
> [1]
> >  - MKLDNN backend as default CPU backend from source building, Jan,
> > 2019
> > >> [2]
> >  - MKLDNN subgraph optimization as default for the inference, Jul,
> 2019
> > >> [3]
> >  - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
> > 
> >  To make more successful and technical 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Marco de Abreu
Sorry, my intent with the "non-standard" phrase was not about general MXNet
but rather from MKLDNNs point of view, considering that it's being
developed by Intel, I assumed that MKLDNN might consider non-intel
use-cases non standard.

-Marco

Skalicky, Sam  schrieb am Mo., 18. Nov. 2019,
21:34:

> Thanks Alfredo, if you can create a GitHub issue with notes/steps we can
> add this to the todo list for integrating with the MXNet CI to test on m5a
> instances too. Then we can start tracking this on a regular basis. It would
> be great to actually test on ARM instances now that AWS has A1 instances
> too…..ill add it to the wish list ;-D
>
> Sam
>
> > On Nov 18, 2019, at 12:32 PM, Alfredo Luque 
> > 
> wrote:
> >
> > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
> > generation AMD Threadripper Gen 1 if someone has something easy to run
> and
> > representative.
> >
> > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > sska...@amazon.com.invalid) wrote:
> >
> > Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or is
> > there someone else in the mxnet dev@ community who can help?
> >
> > Sam
> >
> >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> >  wrote:
> >>
> >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
> > would
> >> definitely make sense as a requirement. It seems odd to classify that as
> > a
> >> “nonstandard” use case.
> >>
> >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> >> sska...@amazon.com.invalid) wrote:
> >>
> >> Thanks Patric & team for your work over the years to make MXNet fast
> with
> >> MKLDNN!
> >>
> >> I think it would be great to make MKLDNN enabled by default. We will
> need
> >> to continue producing variants without MKLDNN for those who don’t want
> it
> >> (Marco enumerated some use cases). How do you propose to identify the
> pip
> >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > mxnet-cu101mkl
> >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do
> > you
> >> propose we call the build without MKLDNN? mxnet-nomkl?
> >>
> >> Thanks!
> >> Sam
> >>
> >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
> >> wrote:
> >>>
> >>> Hi Patric,
> >>>
> >>> First of all, thanks a lot to you and your team for all the effort on
> >> MXNet
> >>> and mkldnn!
> >>>
> >>> Generally I'm inclined towards your proposal, but I'm thinking about
> the
> >>> non-standard use cases:
> >>> - AMD CPU
> >>> - ARM CPU
> >>> - Windows
> >>> - GPU and MKLDNN enabled
> >>> - Fully reproducible results (medical and financial sector requested
> > that
> >>> and we have some flags for cuda)
> >>>
> >>> Is mkldnn fully compatible with these use cases? If not, what would
> >> happen?
> >>> If yes, do we have performance numbers?
> >>>
> >>> Best regards,
> >>> Marco
> >>>
> >>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
> >> 14:00:
> >>>
>  Hi MXNet community,
> 
>  From the first MKLDNN backend integrated in release 1.2, the community
> >> is
>  continuously improving the quality and performance of MKLDNN CPU
> >> backend.
>  Nowadays, the MKLDNN backend is widely used for the inference,
> >> especially
>  for INT8 inference, and we got lots of very positive feedbacks from
> >> MXNet
>  users.
> 
>  Achieved milestones as below:
> 
>  - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>  - MKLDNN backend as default CPU backend from source building, Jan,
> 2019
> >> [2]
>  - MKLDNN subgraph optimization as default for the inference, Jul, 2019
> >> [3]
>  - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
> 
>  To make more successful and technical leadership for Apache MXNet in
> > the
>  industry, I propose to make MKLDNN as default CPU backend in all
> binary
>  distribution from the next release.
>  The new milestone includes:
> 
>  - Static link MKLDNN library in the binary avoiding the mismatch
> > version
>  in the runtime [5]
>  - Make nightly build with MKLDNN default from master pre 1.7 release
>  - Binary distribution with MKLDNN default from 1.7 release.
> 
>  What will be changed:
> 
>  - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>  - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>  (1.x) and plan to remove in next major release (2.0)
> 
>  Suggestions and comments are highly appreciated.
> 
>  Thanks,
> 
>  --Patric
> 
> 
>  [1] https://github.com/apache/incubator-mxnet/pull/9677
>  [2]
> 
> >>
> >
> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>  [3] https://github.com/apache/incubator-mxnet/pull/15518
>  [4]
> 
> >>
> >
> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>  [5] 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Skalicky, Sam
Thanks Alfredo, if you can create a GitHub issue with notes/steps we can add 
this to the todo list for integrating with the MXNet CI to test on m5a 
instances too. Then we can start tracking this on a regular basis. It would be 
great to actually test on ARM instances now that AWS has A1 instances too…..ill 
add it to the wish list ;-D

Sam

> On Nov 18, 2019, at 12:32 PM, Alfredo Luque 
>  wrote:
> 
> Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
> generation AMD Threadripper Gen 1 if someone has something easy to run and
> representative.
> 
> On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> sska...@amazon.com.invalid) wrote:
> 
> Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or is
> there someone else in the mxnet dev@ community who can help?
> 
> Sam
> 
>> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
>  wrote:
>> 
>> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
> would
>> definitely make sense as a requirement. It seems odd to classify that as
> a
>> “nonstandard” use case.
>> 
>> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
>> sska...@amazon.com.invalid) wrote:
>> 
>> Thanks Patric & team for your work over the years to make MXNet fast with
>> MKLDNN!
>> 
>> I think it would be great to make MKLDNN enabled by default. We will need
>> to continue producing variants without MKLDNN for those who don’t want it
>> (Marco enumerated some use cases). How do you propose to identify the pip
>> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> mxnet-cu101mkl
>> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do
> you
>> propose we call the build without MKLDNN? mxnet-nomkl?
>> 
>> Thanks!
>> Sam
>> 
>>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
>> wrote:
>>> 
>>> Hi Patric,
>>> 
>>> First of all, thanks a lot to you and your team for all the effort on
>> MXNet
>>> and mkldnn!
>>> 
>>> Generally I'm inclined towards your proposal, but I'm thinking about the
>>> non-standard use cases:
>>> - AMD CPU
>>> - ARM CPU
>>> - Windows
>>> - GPU and MKLDNN enabled
>>> - Fully reproducible results (medical and financial sector requested
> that
>>> and we have some flags for cuda)
>>> 
>>> Is mkldnn fully compatible with these use cases? If not, what would
>> happen?
>>> If yes, do we have performance numbers?
>>> 
>>> Best regards,
>>> Marco
>>> 
>>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
>> 14:00:
>>> 
 Hi MXNet community,
 
 From the first MKLDNN backend integrated in release 1.2, the community
>> is
 continuously improving the quality and performance of MKLDNN CPU
>> backend.
 Nowadays, the MKLDNN backend is widely used for the inference,
>> especially
 for INT8 inference, and we got lots of very positive feedbacks from
>> MXNet
 users.
 
 Achieved milestones as below:
 
 - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
 - MKLDNN backend as default CPU backend from source building, Jan, 2019
>> [2]
 - MKLDNN subgraph optimization as default for the inference, Jul, 2019
>> [3]
 - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
 
 To make more successful and technical leadership for Apache MXNet in
> the
 industry, I propose to make MKLDNN as default CPU backend in all binary
 distribution from the next release.
 The new milestone includes:
 
 - Static link MKLDNN library in the binary avoiding the mismatch
> version
 in the runtime [5]
 - Make nightly build with MKLDNN default from master pre 1.7 release
 - Binary distribution with MKLDNN default from 1.7 release.
 
 What will be changed:
 
 - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
 - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
 (1.x) and plan to remove in next major release (2.0)
 
 Suggestions and comments are highly appreciated.
 
 Thanks,
 
 --Patric
 
 
 [1] https://github.com/apache/incubator-mxnet/pull/9677
 [2]
 
>> 
> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
 [3] https://github.com/apache/incubator-mxnet/pull/15518
 [4]
 
>> 
> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
 [5] https://github.com/apache/incubator-mxnet/pull/16731
 
>> 
>> —
>> Alfredo Luque
>> Software Engineer
>> Machine Learning Infrastructure
>> Airbnb
>> San Francisco, CA
> 
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA



Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Alfredo Luque
Happy to run some benchmarks on an AWS m5a instance (Epyc) and first
generation AMD Threadripper Gen 1 if someone has something easy to run and
representative.

On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
sska...@amazon.com.invalid) wrote:

Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or is
there someone else in the mxnet dev@ community who can help?

Sam

> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
 wrote:
>
> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc)
would
> definitely make sense as a requirement. It seems odd to classify that as
a
> “nonstandard” use case.
>
> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> sska...@amazon.com.invalid) wrote:
>
> Thanks Patric & team for your work over the years to make MXNet fast with
> MKLDNN!
>
> I think it would be great to make MKLDNN enabled by default. We will need
> to continue producing variants without MKLDNN for those who don’t want it
> (Marco enumerated some use cases). How do you propose to identify the pip
> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
mxnet-cu101mkl
> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do
you
> propose we call the build without MKLDNN? mxnet-nomkl?
>
> Thanks!
> Sam
>
>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
> wrote:
>>
>> Hi Patric,
>>
>> First of all, thanks a lot to you and your team for all the effort on
> MXNet
>> and mkldnn!
>>
>> Generally I'm inclined towards your proposal, but I'm thinking about the
>> non-standard use cases:
>> - AMD CPU
>> - ARM CPU
>> - Windows
>> - GPU and MKLDNN enabled
>> - Fully reproducible results (medical and financial sector requested
that
>> and we have some flags for cuda)
>>
>> Is mkldnn fully compatible with these use cases? If not, what would
> happen?
>> If yes, do we have performance numbers?
>>
>> Best regards,
>> Marco
>>
>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
> 14:00:
>>
>>> Hi MXNet community,
>>>
>>> From the first MKLDNN backend integrated in release 1.2, the community
> is
>>> continuously improving the quality and performance of MKLDNN CPU
> backend.
>>> Nowadays, the MKLDNN backend is widely used for the inference,
> especially
>>> for INT8 inference, and we got lots of very positive feedbacks from
> MXNet
>>> users.
>>>
>>> Achieved milestones as below:
>>>
>>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>>> - MKLDNN backend as default CPU backend from source building, Jan, 2019
> [2]
>>> - MKLDNN subgraph optimization as default for the inference, Jul, 2019
> [3]
>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>>>
>>> To make more successful and technical leadership for Apache MXNet in
the
>>> industry, I propose to make MKLDNN as default CPU backend in all binary
>>> distribution from the next release.
>>> The new milestone includes:
>>>
>>> - Static link MKLDNN library in the binary avoiding the mismatch
version
>>> in the runtime [5]
>>> - Make nightly build with MKLDNN default from master pre 1.7 release
>>> - Binary distribution with MKLDNN default from 1.7 release.
>>>
>>> What will be changed:
>>>
>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>>> (1.x) and plan to remove in next major release (2.0)
>>>
>>> Suggestions and comments are highly appreciated.
>>>
>>> Thanks,
>>>
>>> --Patric
>>>
>>>
>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>>> [2]
>>>
>
https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>>> [4]
>>>
>
https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>>>
>
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Skalicky, Sam
Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or is there 
someone else in the mxnet dev@ community who can help?

Sam

> On Nov 18, 2019, at 12:27 PM, Alfredo Luque 
>  wrote:
> 
> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc) would
> definitely make sense as a requirement. It seems odd to classify that as a
> “nonstandard” use case.
> 
> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> sska...@amazon.com.invalid) wrote:
> 
> Thanks Patric & team for your work over the years to make MXNet fast with
> MKLDNN!
> 
> I think it would be great to make MKLDNN enabled by default. We will need
> to continue producing variants without MKLDNN for those who don’t want it
> (Marco enumerated some use cases). How do you propose to identify the pip
> wheels with/without MKLDNN? Previously we had: mxnet-mkl and mxnet-cu101mkl
> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do you
> propose we call the build without MKLDNN? mxnet-nomkl?
> 
> Thanks!
> Sam
> 
>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
> wrote:
>> 
>> Hi Patric,
>> 
>> First of all, thanks a lot to you and your team for all the effort on
> MXNet
>> and mkldnn!
>> 
>> Generally I'm inclined towards your proposal, but I'm thinking about the
>> non-standard use cases:
>> - AMD CPU
>> - ARM CPU
>> - Windows
>> - GPU and MKLDNN enabled
>> - Fully reproducible results (medical and financial sector requested that
>> and we have some flags for cuda)
>> 
>> Is mkldnn fully compatible with these use cases? If not, what would
> happen?
>> If yes, do we have performance numbers?
>> 
>> Best regards,
>> Marco
>> 
>> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
> 14:00:
>> 
>>> Hi MXNet community,
>>> 
>>> From the first MKLDNN backend integrated in release 1.2, the community
> is
>>> continuously improving the quality and performance of MKLDNN CPU
> backend.
>>> Nowadays, the MKLDNN backend is widely used for the inference,
> especially
>>> for INT8 inference, and we got lots of very positive feedbacks from
> MXNet
>>> users.
>>> 
>>> Achieved milestones as below:
>>> 
>>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>>> - MKLDNN backend as default CPU backend from source building, Jan, 2019
> [2]
>>> - MKLDNN subgraph optimization as default for the inference, Jul, 2019
> [3]
>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>>> 
>>> To make more successful and technical leadership for Apache MXNet in the
>>> industry, I propose to make MKLDNN as default CPU backend in all binary
>>> distribution from the next release.
>>> The new milestone includes:
>>> 
>>> - Static link MKLDNN library in the binary avoiding the mismatch version
>>> in the runtime [5]
>>> - Make nightly build with MKLDNN default from master pre 1.7 release
>>> - Binary distribution with MKLDNN default from 1.7 release.
>>> 
>>> What will be changed:
>>> 
>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>>> (1.x) and plan to remove in next major release (2.0)
>>> 
>>> Suggestions and comments are highly appreciated.
>>> 
>>> Thanks,
>>> 
>>> --Patric
>>> 
>>> 
>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>>> [2]
>>> 
> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>>> [4]
>>> 
> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>>> 
> 
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA



Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Alfredo Luque
Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc) would
definitely make sense as a requirement. It seems odd to classify that as a
“nonstandard” use case.

On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
sska...@amazon.com.invalid) wrote:

Thanks Patric & team for your work over the years to make MXNet fast with
MKLDNN!

I think it would be great to make MKLDNN enabled by default. We will need
to continue producing variants without MKLDNN for those who don’t want it
(Marco enumerated some use cases). How do you propose to identify the pip
wheels with/without MKLDNN? Previously we had: mxnet-mkl and mxnet-cu101mkl
with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do you
propose we call the build without MKLDNN? mxnet-nomkl?

Thanks!
Sam

> On Nov 18, 2019, at 11:08 AM, Marco de Abreu 
wrote:
>
> Hi Patric,
>
> First of all, thanks a lot to you and your team for all the effort on
MXNet
> and mkldnn!
>
> Generally I'm inclined towards your proposal, but I'm thinking about the
> non-standard use cases:
> - AMD CPU
> - ARM CPU
> - Windows
> - GPU and MKLDNN enabled
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda)
>
> Is mkldnn fully compatible with these use cases? If not, what would
happen?
> If yes, do we have performance numbers?
>
> Best regards,
> Marco
>
> Zhao, Patric  schrieb am Mo., 18. Nov. 2019,
14:00:
>
>> Hi MXNet community,
>>
>> From the first MKLDNN backend integrated in release 1.2, the community
is
>> continuously improving the quality and performance of MKLDNN CPU
backend.
>> Nowadays, the MKLDNN backend is widely used for the inference,
especially
>> for INT8 inference, and we got lots of very positive feedbacks from
MXNet
>> users.
>>
>> Achieved milestones as below:
>>
>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>> - MKLDNN backend as default CPU backend from source building, Jan, 2019
[2]
>> - MKLDNN subgraph optimization as default for the inference, Jul, 2019
[3]
>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>>
>> To make more successful and technical leadership for Apache MXNet in the
>> industry, I propose to make MKLDNN as default CPU backend in all binary
>> distribution from the next release.
>> The new milestone includes:
>>
>> - Static link MKLDNN library in the binary avoiding the mismatch version
>> in the runtime [5]
>> - Make nightly build with MKLDNN default from master pre 1.7 release
>> - Binary distribution with MKLDNN default from 1.7 release.
>>
>> What will be changed:
>>
>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>> (1.x) and plan to remove in next major release (2.0)
>>
>> Suggestions and comments are highly appreciated.
>>
>> Thanks,
>>
>> --Patric
>>
>>
>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>> [2]
>>
https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>> [4]
>>
https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>>

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA


Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Skalicky, Sam
Thanks Patric & team for your work over the years to make MXNet fast with 
MKLDNN!

I think it would be great to make MKLDNN enabled by default. We will need to 
continue producing variants without MKLDNN for those who don’t want it (Marco 
enumerated some use cases). How do you propose to identify the pip wheels 
with/without MKLDNN? Previously we had: mxnet-mkl and mxnet-cu101mkl with 
MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what do you propose 
we call the build without MKLDNN? mxnet-nomkl?

Thanks!
Sam

> On Nov 18, 2019, at 11:08 AM, Marco de Abreu  wrote:
> 
> Hi Patric,
> 
> First of all, thanks a lot to you and your team for all the effort on MXNet
> and mkldnn!
> 
> Generally I'm inclined towards your proposal, but I'm thinking about the
> non-standard use cases:
> - AMD CPU
> - ARM CPU
> - Windows
> - GPU and MKLDNN enabled
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda)
> 
> Is mkldnn fully compatible with these use cases? If not, what would happen?
> If yes, do we have performance numbers?
> 
> Best regards,
> Marco
> 
> Zhao, Patric  schrieb am Mo., 18. Nov. 2019, 14:00:
> 
>> Hi MXNet community,
>> 
>> From the first MKLDNN backend integrated in release 1.2,  the community is
>> continuously improving the quality and performance of MKLDNN CPU backend.
>> Nowadays, the MKLDNN backend is widely used for the inference, especially
>> for INT8 inference,  and we got lots of very positive feedbacks from MXNet
>> users.
>> 
>> Achieved milestones as below:
>> 
>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
>> - MKLDNN backend as default CPU backend from source building, Jan, 2019 [2]
>> - MKLDNN subgraph optimization as default for the inference, Jul, 2019 [3]
>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>> 
>> To make more successful and technical leadership for Apache MXNet in the
>> industry, I propose to make MKLDNN as default CPU backend in all binary
>> distribution from the next release.
>> The new milestone includes:
>> 
>> - Static link MKLDNN library in the binary avoiding the mismatch version
>> in the runtime [5]
>> - Make nightly build with MKLDNN default from master pre 1.7 release
>> - Binary distribution with MKLDNN default from 1.7 release.
>> 
>> What will be changed:
>> 
>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
>> (1.x) and plan to remove in next major release (2.0)
>> 
>> Suggestions and comments are highly appreciated.
>> 
>> Thanks,
>> 
>> --Patric
>> 
>> 
>> [1] https://github.com/apache/incubator-mxnet/pull/9677
>> [2]
>> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
>> [3] https://github.com/apache/incubator-mxnet/pull/15518
>> [4]
>> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
>> [5] https://github.com/apache/incubator-mxnet/pull/16731
>> 



Re: Proposal to make MKLDNN as default CPU backend

2019-11-18 Thread Marco de Abreu
Hi Patric,

First of all, thanks a lot to you and your team for all the effort on MXNet
and mkldnn!

Generally I'm inclined towards your proposal, but I'm thinking about the
non-standard use cases:
- AMD CPU
- ARM CPU
- Windows
- GPU and MKLDNN enabled
- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda)

Is mkldnn fully compatible with these use cases? If not, what would happen?
If yes, do we have performance numbers?

Best regards,
Marco

Zhao, Patric  schrieb am Mo., 18. Nov. 2019, 14:00:

> Hi MXNet community,
>
> From the first MKLDNN backend integrated in release 1.2,  the community is
> continuously improving the quality and performance of MKLDNN CPU backend.
> Nowadays, the MKLDNN backend is widely used for the inference, especially
> for INT8 inference,  and we got lots of very positive feedbacks from MXNet
> users.
>
> Achieved milestones as below:
>
> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 [1]
> - MKLDNN backend as default CPU backend from source building, Jan, 2019 [2]
> - MKLDNN subgraph optimization as default for the inference, Jul, 2019 [3]
> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
>
> To make more successful and technical leadership for Apache MXNet in the
> industry, I propose to make MKLDNN as default CPU backend in all binary
> distribution from the next release.
> The new milestone includes:
>
> - Static link MKLDNN library in the binary avoiding the mismatch version
> in the runtime [5]
> - Make nightly build with MKLDNN default from master pre 1.7 release
> - Binary distribution with MKLDNN default from 1.7 release.
>
> What will be changed:
>
> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor release
> (1.x) and plan to remove in next major release (2.0)
>
> Suggestions and comments are highly appreciated.
>
> Thanks,
>
> --Patric
>
>
> [1] https://github.com/apache/incubator-mxnet/pull/9677
> [2]
> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
> [3] https://github.com/apache/incubator-mxnet/pull/15518
> [4]
> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
> [5] https://github.com/apache/incubator-mxnet/pull/16731
>