roadmap discussion for release 1.7

2019-11-19 Thread Zhao, Patric
Hi MXNet community,

The release 1.6 is WIP and will be released soon. I think it’s time to discuss 
the roadmap of 1.7.

I have created a github thread  (#16864) for the new feature discussion.

Feel free to add your plan in it 

https://github.com/apache/incubator-mxnet/issues/16864

Thanks,

--Patric


RE: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Zhao, Patric
Thanks all of the great suggestions. 

Regarding the binary release, including w/o MKLDNN build, I have summarized a 
table (check attachment).

- Major changes in python packages, see attached table. 
- Switch on MKLDNN for no mkl suffix binary in release 1.7 (Red check mark) 
- Add new mxnet-native build w/o MKLDNN and cuDNN (Yellow background)
  Track the usage/download in 1-2 releases and then decide if we need it for a 
long time
- Drop all mkl suffix binary in next major release v2.x.

Thanks,

--Patric

> -Original Message-
> From: Lin Yuan 
> Sent: Wednesday, November 20, 2019 5:40 AM
> To: dev@mxnet.incubator.apache.org
> Cc: Tao Lv 
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> Also per Sam's suggestion, we could still release a build without MKLDNN
> (name it mxnet-nomkldnn?) and track the usage/download for one or two
> releases. If there is no usage, we could drop that build in the future.
> 
> Best,
> 
> Lin
> 
> On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:
> 
> > Just to summarize base on the concerns Marco raised and discussed
> abvove:
> >
> > - AMD CPU (it should work with MKLDNN:
> >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+M
> KL
> > -DNN+-+Performance+Benchmarking
> > )
> > - ARM CPU (we don't have it today w/o MKLDNN either)
> > - Windows (Windows support is there regardless of MKLDNN or not)
> > - GPU and MKLDNN enabled (already supported)
> > - Fully reproducible results (medical and financial sector requested
> > that and we have some flags for cuda) (The nondeterminism exists even
> > today w/o MKLDNN. We should address it regardless of MLKDNN)
> >
> > Marco, please let us know if your concerns are properly addressed?
> >
> > Given that MKLDNN gives significant performance speed up in CPU, I am
> > inclined to make it default in pip build.
> >
> > Best,
> >
> > Lin
> >
> > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> > wrote:
> >
> >> Thanks, Patric. I was just trying to point out that there was
> >> currently no guarantee of deterministic results without MKL, so
> >> there’s not necessarily an expectation of determinism with MKL (ie
> requirement isn’t relaxed).
> >>
> >> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> >> wrote:
> >>
> >> > It may be a concern but little noise can't affect the final results
> >> > if
> >> the
> >> > algorithm is stable in numerical.
> >> > The MKLDNN backend with mxnet-mkl has been used for 2 years and
> we
> >> didn't
> >> > see the coverage issue caused by multiple threading.
> >> > In other words, GPU programming mode works well on training where
> >> > the non-deterministic also exists from multiple threads.
> >> >
> >> > Parts of training accuracy was pasted in the first PR when MKLDNN
> >> > is integrated.
> >> >
> >> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-
> 3596
> >> 74818
> >> >
> >> > In conclusion, it may happen with very little probability. I
> >> > believe we can get a solution in case it happens someday.
> >> >
> >> > Thanks,
> >> >
> >> > --Patric
> >> >
> >> >
> >> > > -Original Message-
> >> > > From: Chris Olivier 
> >> > > Sent: Tuesday, November 19, 2019 11:51 AM
> >> > > To: dev@mxnet.incubator.apache.org
> >> > > Cc: Tao Lv 
> >> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >> > >
> >> > > (for non mkl dropout, for instance)
> >> > >
> >> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier
> >> > > 
> >> > > wrote:
> >> > >
> >> > > > To address the deterministic item, I know for a fact that
> >> > > > training will not be deterministic in some cases where the “parallel
> random”
> >> > > > class is utilized in parallel threads, such as OMP, if the
> >> > > > number of cores is different, even with the same seed, because
> >> > > > threads are seeded independently and different number of
> >> > > > threads will end up generating different random number
> >> > > > sequences. Dropout operator being
> >> > > an example.
> >> > > >
> >> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> >> > > >  wrote:
> >> > > >
> >> > > >> For AMD CPUs, you’d want to perform validation because now
> >> > > >> MKL-DNN would be enabled by default. Historically, other intel
> >> > > >> libraries (along with the ICC
> >> > > >> compiler) have had performance issues on AMD CPUs. It’s just
> >> > > >> worth double checking to make sure that’s not the case here.
> >> > > >> Perhaps some MKL-DNN authors can chime in though. It’s not
> >> > > >> sufficient to double check that an
> >> > > >> AVX2 package passes tests.
> >> > > >>
> >> > > >> Agreed in the case we’re not releasing ARM binaries.
> >> > > >>
> >> > > >> The reproducibility argument is around the results being
> >> numerically
> >> > > >> reproducible. That is, eg; if I train a model with some fixed
> >> > > >> set
> >> of
> >> > > >> data, some random seed, etc. and then run inference on it do I
> >> > > >> get the exact same floating point values for the weights and 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Lin Yuan
Also per Sam's suggestion, we could still release a build without MKLDNN
(name it mxnet-nomkldnn?) and track the usage/download for one or two
releases. If there is no usage, we could drop that build in the future.

Best,

Lin

On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan  wrote:

> Just to summarize base on the concerns Marco raised and discussed abvove:
>
> - AMD CPU (it should work with MKLDNN:
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
> )
> - ARM CPU (we don't have it today w/o MKLDNN either)
> - Windows (Windows support is there regardless of MKLDNN or not)
> - GPU and MKLDNN enabled (already supported)
> - Fully reproducible results (medical and financial sector requested that
> and we have some flags for cuda) (The nondeterminism exists even today w/o
> MKLDNN. We should address it regardless of MLKDNN)
>
> Marco, please let us know if your concerns are properly addressed?
>
> Given that MKLDNN gives significant performance speed up in CPU, I am
> inclined to make it default in pip build.
>
> Best,
>
> Lin
>
> On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier 
> wrote:
>
>> Thanks, Patric. I was just trying to point out that there was currently no
>> guarantee of deterministic results without MKL, so there’s not necessarily
>> an expectation of determinism with MKL (ie requirement isn’t relaxed).
>>
>> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
>> wrote:
>>
>> > It may be a concern but little noise can't affect the final results if
>> the
>> > algorithm is stable in numerical.
>> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we
>> didn't
>> > see the coverage issue caused by multiple threading.
>> > In other words, GPU programming mode works well on training where the
>> > non-deterministic also exists from multiple threads.
>> >
>> > Parts of training accuracy was pasted in the first PR when MKLDNN is
>> > integrated.
>> >
>> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
>> >
>> > In conclusion, it may happen with very little probability. I believe we
>> > can get a solution in case it happens someday.
>> >
>> > Thanks,
>> >
>> > --Patric
>> >
>> >
>> > > -Original Message-
>> > > From: Chris Olivier 
>> > > Sent: Tuesday, November 19, 2019 11:51 AM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Cc: Tao Lv 
>> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
>> > >
>> > > (for non mkl dropout, for instance)
>> > >
>> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
>> > > wrote:
>> > >
>> > > > To address the deterministic item, I know for a fact that training
>> > > > will not be deterministic in some cases where the “parallel random”
>> > > > class is utilized in parallel threads, such as OMP, if the number of
>> > > > cores is different, even with the same seed, because threads are
>> > > > seeded independently and different number of threads will end up
>> > > > generating different random number sequences. Dropout operator being
>> > > an example.
>> > > >
>> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
>> > > >  wrote:
>> > > >
>> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
>> > > >> would be enabled by default. Historically, other intel libraries
>> > > >> (along with the ICC
>> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth
>> > > >> double checking to make sure that’s not the case here. Perhaps some
>> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double
>> > > >> check that an
>> > > >> AVX2 package passes tests.
>> > > >>
>> > > >> Agreed in the case we’re not releasing ARM binaries.
>> > > >>
>> > > >> The reproducibility argument is around the results being
>> numerically
>> > > >> reproducible. That is, eg; if I train a model with some fixed set
>> of
>> > > >> data, some random seed, etc. and then run inference on it do I get
>> > > >> the exact same floating point values for the weights and results?
>> > > >> Does MxNet already offer this without MKL-DNN?
>> > > >>
>> > > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
>> > > wrote:
>> > > >>
>> > > >> Regarding the cases listed by Marco:
>> > > >> - AMD CPU
>> > > >> From my architecture knowledge, what works on C4 instances (with
>> AVX2
>> > > >> support) should also work well on m5a, right? I think mxnet-mkl and
>> > > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
>> > > >> Also, we didn't perform any validation on AMD CPU before, why we
>> need
>> > > >> do that for this time?
>> > > >>
>> > > >> - ARM CPU
>> > > >> I don't know we're releasing any convenience binaries for ARM CPU.
>> > > >> This proposal mainly targets those pypi packages.
>> > > >>
>> > > >> - Windows
>> > > >> Already validated by CI. We're also releasing mxnet-mkl packages
>> for
>> > Win.
>> > > >>
>> > > >> - GPU and MKLDNN enabled
>> > > >> Already validated by CI and mxnet-cuxxmkl packages have 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Lin Yuan
Just to summarize base on the concerns Marco raised and discussed abvove:

- AMD CPU (it should work with MKLDNN:
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking
)
- ARM CPU (we don't have it today w/o MKLDNN either)
- Windows (Windows support is there regardless of MKLDNN or not)
- GPU and MKLDNN enabled (already supported)
- Fully reproducible results (medical and financial sector requested that
and we have some flags for cuda) (The nondeterminism exists even today w/o
MKLDNN. We should address it regardless of MLKDNN)

Marco, please let us know if your concerns are properly addressed?

Given that MKLDNN gives significant performance speed up in CPU, I am
inclined to make it default in pip build.

Best,

Lin

On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier  wrote:

> Thanks, Patric. I was just trying to point out that there was currently no
> guarantee of deterministic results without MKL, so there’s not necessarily
> an expectation of determinism with MKL (ie requirement isn’t relaxed).
>
> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric 
> wrote:
>
> > It may be a concern but little noise can't affect the final results if
> the
> > algorithm is stable in numerical.
> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't
> > see the coverage issue caused by multiple threading.
> > In other words, GPU programming mode works well on training where the
> > non-deterministic also exists from multiple threads.
> >
> > Parts of training accuracy was pasted in the first PR when MKLDNN is
> > integrated.
> >
> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
> >
> > In conclusion, it may happen with very little probability. I believe we
> > can get a solution in case it happens someday.
> >
> > Thanks,
> >
> > --Patric
> >
> >
> > > -Original Message-
> > > From: Chris Olivier 
> > > Sent: Tuesday, November 19, 2019 11:51 AM
> > > To: dev@mxnet.incubator.apache.org
> > > Cc: Tao Lv 
> > > Subject: Re: Proposal to make MKLDNN as default CPU backend
> > >
> > > (for non mkl dropout, for instance)
> > >
> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
> > > wrote:
> > >
> > > > To address the deterministic item, I know for a fact that training
> > > > will not be deterministic in some cases where the “parallel random”
> > > > class is utilized in parallel threads, such as OMP, if the number of
> > > > cores is different, even with the same seed, because threads are
> > > > seeded independently and different number of threads will end up
> > > > generating different random number sequences. Dropout operator being
> > > an example.
> > > >
> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > > >  wrote:
> > > >
> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> > > >> would be enabled by default. Historically, other intel libraries
> > > >> (along with the ICC
> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth
> > > >> double checking to make sure that’s not the case here. Perhaps some
> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double
> > > >> check that an
> > > >> AVX2 package passes tests.
> > > >>
> > > >> Agreed in the case we’re not releasing ARM binaries.
> > > >>
> > > >> The reproducibility argument is around the results being numerically
> > > >> reproducible. That is, eg; if I train a model with some fixed set of
> > > >> data, some random seed, etc. and then run inference on it do I get
> > > >> the exact same floating point values for the weights and results?
> > > >> Does MxNet already offer this without MKL-DNN?
> > > >>
> > > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
> > > wrote:
> > > >>
> > > >> Regarding the cases listed by Marco:
> > > >> - AMD CPU
> > > >> From my architecture knowledge, what works on C4 instances (with
> AVX2
> > > >> support) should also work well on m5a, right? I think mxnet-mkl and
> > > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> > > >> Also, we didn't perform any validation on AMD CPU before, why we
> need
> > > >> do that for this time?
> > > >>
> > > >> - ARM CPU
> > > >> I don't know we're releasing any convenience binaries for ARM CPU.
> > > >> This proposal mainly targets those pypi packages.
> > > >>
> > > >> - Windows
> > > >> Already validated by CI. We're also releasing mxnet-mkl packages for
> > Win.
> > > >>
> > > >> - GPU and MKLDNN enabled
> > > >> Already validated by CI and mxnet-cuxxmkl packages have been
> released
> > > >> for several versions.
> > > >>
> > > >> - Fully reproducible results (medical and financial sector requested
> > > >> that and we have some flags for cuda) Not sure I understand this
> > > >> case. We already have MKL-DNN backend for a while. Functionality and
> > > >> correctness of it have been verified by MXNet users.
> > > >>
> > > >> -tao
> > > >>
> > > >> On Tue, Nov 19, 2019 at 4:41 AM 

Re: Proposal to make MKLDNN as default CPU backend

2019-11-19 Thread Chris Olivier
Thanks, Patric. I was just trying to point out that there was currently no
guarantee of deterministic results without MKL, so there’s not necessarily
an expectation of determinism with MKL (ie requirement isn’t relaxed).

On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric  wrote:

> It may be a concern but little noise can't affect the final results if the
> algorithm is stable in numerical.
> The MKLDNN backend with mxnet-mkl has been used for 2 years and we didn't
> see the coverage issue caused by multiple threading.
> In other words, GPU programming mode works well on training where the
> non-deterministic also exists from multiple threads.
>
> Parts of training accuracy was pasted in the first PR when MKLDNN is
> integrated.
> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818
>
> In conclusion, it may happen with very little probability. I believe we
> can get a solution in case it happens someday.
>
> Thanks,
>
> --Patric
>
>
> > -Original Message-
> > From: Chris Olivier 
> > Sent: Tuesday, November 19, 2019 11:51 AM
> > To: dev@mxnet.incubator.apache.org
> > Cc: Tao Lv 
> > Subject: Re: Proposal to make MKLDNN as default CPU backend
> >
> > (for non mkl dropout, for instance)
> >
> > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier 
> > wrote:
> >
> > > To address the deterministic item, I know for a fact that training
> > > will not be deterministic in some cases where the “parallel random”
> > > class is utilized in parallel threads, such as OMP, if the number of
> > > cores is different, even with the same seed, because threads are
> > > seeded independently and different number of threads will end up
> > > generating different random number sequences. Dropout operator being
> > an example.
> > >
> > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque
> > >  wrote:
> > >
> > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> > >> would be enabled by default. Historically, other intel libraries
> > >> (along with the ICC
> > >> compiler) have had performance issues on AMD CPUs. It’s just worth
> > >> double checking to make sure that’s not the case here. Perhaps some
> > >> MKL-DNN authors can chime in though. It’s not sufficient to double
> > >> check that an
> > >> AVX2 package passes tests.
> > >>
> > >> Agreed in the case we’re not releasing ARM binaries.
> > >>
> > >> The reproducibility argument is around the results being numerically
> > >> reproducible. That is, eg; if I train a model with some fixed set of
> > >> data, some random seed, etc. and then run inference on it do I get
> > >> the exact same floating point values for the weights and results?
> > >> Does MxNet already offer this without MKL-DNN?
> > >>
> > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com)
> > wrote:
> > >>
> > >> Regarding the cases listed by Marco:
> > >> - AMD CPU
> > >> From my architecture knowledge, what works on C4 instances (with AVX2
> > >> support) should also work well on m5a, right? I think mxnet-mkl and
> > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines.
> > >> Also, we didn't perform any validation on AMD CPU before, why we need
> > >> do that for this time?
> > >>
> > >> - ARM CPU
> > >> I don't know we're releasing any convenience binaries for ARM CPU.
> > >> This proposal mainly targets those pypi packages.
> > >>
> > >> - Windows
> > >> Already validated by CI. We're also releasing mxnet-mkl packages for
> Win.
> > >>
> > >> - GPU and MKLDNN enabled
> > >> Already validated by CI and mxnet-cuxxmkl packages have been released
> > >> for several versions.
> > >>
> > >> - Fully reproducible results (medical and financial sector requested
> > >> that and we have some flags for cuda) Not sure I understand this
> > >> case. We already have MKL-DNN backend for a while. Functionality and
> > >> correctness of it have been verified by MXNet users.
> > >>
> > >> -tao
> > >>
> > >> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> > >> 
> > >> wrote:
> > >>
> > >> > Sorry, my intent with the "non-standard" phrase was not about
> > >> > general
> > >> MXNet
> > >> > but rather from MKLDNNs point of view, considering that it's being
> > >> > developed by Intel, I assumed that MKLDNN might consider non-intel
> > >> > use-cases non standard.
> > >> >
> > >> > -Marco
> > >> >
> > >> > Skalicky, Sam  schrieb am Mo., 18. Nov.
> > >> 2019,
> > >> > 21:34:
> > >> >
> > >> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps
> > >> > > we
> > >> can
> > >> > > add this to the todo list for integrating with the MXNet CI to
> > >> > > test on
> > >> > m5a
> > >> > > instances too. Then we can start tracking this on a regular
> > >> > > basis. It
> > >> > would
> > >> > > be great to actually test on ARM instances now that AWS has A1
> > >> instances
> > >> > > too…..ill add it to the wish list ;-D
> > >> > >
> > >> > > Sam
> > >> > >
> > >> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <
> > >> alfredo.lu...@airbnb.com