Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Przemysław Trędak
Hi Pedro,

>From the issue that you linked it seems that you are using the LLVM OpenMP, 
>whereas I believe the actual release uses libgomp (at least that's what seems 
>to be the conclusion from this issue: 
>https://github.com/apache/incubator-mxnet/issues/16891)?

Przemek

On 2020/02/04 03:42:30, Pedro Larroy  wrote: 
> -1
> 
> Unit tests passed in CPU build.
> 
> I observe crashes related to openmp using cpp unit tests:
> 
> https://github.com/apache/incubator-mxnet/issues/17043
> 
> Pedro.
> 
> On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat  wrote:
> 
> > +1
> > Successfully built MXNet 1.6.0rc2 on Linux
> > Tested for OpPerf utility
> > For CPU -
> > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> >
> > Works well!
> >
> >
> >
> > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> >
> > > +1
> > >
> > > Tested Horovod with mnist example. My compiler flags are below:
> > >
> > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2,
> > ✔
> > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > CPU_AVX2, ✔
> > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > BLAS_MKL, ✖
> > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖ DEBUG, ✖
> > > TVM_OP]
> > >
> > > Lin
> > >
> > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > >
> > > > +1
> > > >
> > > > I tested below items:
> > > > 1. download artifacts from Apache dist repo;
> > > > 2. the signature looks good;
> > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > 4. run fp32 and int8 inference of ResNet50 under
> > /example/quantization/.
> > > >
> > > > thanks,
> > > > -tao
> > > >
> > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv  wrote:
> > > >
> > > > > I see. I was looking at this page:
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > >
> > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak  > >
> > > > > wrote:
> > > > >
> > > > >> Hi Tao,
> > > > >>
> > > > >> Could you tell me where did you look for it and did not find it? I
> > > just
> > > > >> checked and both
> > > > >> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > and
> > > > >> draft of the release on GitHub have them.
> > > > >>
> > > > >> Thank you
> > > > >> Przemek
> > > > >>
> > > > >> On 2020/02/01 14:23:11, Tao Lv  wrote:
> > > > >> > It seems the src tar and signature are missing from the tag.
> > > > >> >
> > > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak <
> > > > ptre...@apache.org>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Dear MXNet community,
> > > > >> > >
> > > > >> > > This is the vote to release Apache MXNet (incubating) version
> > > 1.6.0.
> > > > >> > > Voting starts today and will close on Monday 2/3/2020 23:59 PST.
> > > > >> > >
> > > > >> > > Link to release notes:
> > > > >> > >
> > > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > >> > >
> > > > >> > > Link to release candidate:
> > > > >> > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > >> > >
> > > > >> > > Link to source and signatures on apache dist server:
> > > > >> > >
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > >> > >
> > > > >> > > The differences comparing to previous release candidate
> > 1.6.0.rc1:
> > > > >> > >  * Fixes for license issues (#17361, #17375, #17370, #17460)
> > > > >> > >  * Bugfix for saving LSTM layer parameter (#17288)
> > > > >> > >  * Bugfix for downloading the model from model zoo from multiple
> > > > >> processes
> > > > >> > > (#17372)
> > > > >> > >  * Fixed a symbol.py in AMP for GluonNLP (#17408)
> > > > >> > >
> > > > >> > >
> > > > >> > > Please remember to TEST first before voting accordingly:
> > > > >> > > +1 = approve
> > > > >> > > +0 = no opinion
> > > > >> > > -1 = disapprove (provide reason)
> > > > >> > >
> > > > >> > >
> > > > >> > > Best regards,
> > > > >> > > Przemyslaw Tredak
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
> >
> > --
> > *Chaitanya Prakash Bapat*
> > *+1 (973) 953-6299*
> >
> > [image: https://www.linkedin.com//in/chaibapat25]
> > [image: https://www.facebook.com/chaibapat
> > ]
> > [image:
> > https://twitter.com/ChaiBapchya] [image:
> > https://www.linkedin.com//in/chaibapat25]
> > 
> >
> 


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
Right. Would it be possible to have the CMake build also use libgomp for
consistency with the releases until these issues are resolved?
This can affect anyone compiling the distribution with CMake and also
happens randomly in CI, worsening the contributor experience due to CI
failures.

On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak  wrote:

> Hi Pedro,
>
> From the issue that you linked it seems that you are using the LLVM
> OpenMP, whereas I believe the actual release uses libgomp (at least that's
> what seems to be the conclusion from this issue:
> https://github.com/apache/incubator-mxnet/issues/16891)?
>
> Przemek
>
> On 2020/02/04 03:42:30, Pedro Larroy 
> wrote:
> > -1
> >
> > Unit tests passed in CPU build.
> >
> > I observe crashes related to openmp using cpp unit tests:
> >
> > https://github.com/apache/incubator-mxnet/issues/17043
> >
> > Pedro.
> >
> > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat 
> wrote:
> >
> > > +1
> > > Successfully built MXNet 1.6.0rc2 on Linux
> > > Tested for OpPerf utility
> > > For CPU -
> > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > >
> > > Works well!
> > >
> > >
> > >
> > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> > >
> > > > +1
> > > >
> > > > Tested Horovod with mnist example. My compiler flags are below:
> > > >
> > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> CPU_SSE2,
> > > ✔
> > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > CPU_AVX2, ✔
> > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > BLAS_MKL, ✖
> > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> DEBUG, ✖
> > > > TVM_OP]
> > > >
> > > > Lin
> > > >
> > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > > >
> > > > > +1
> > > > >
> > > > > I tested below items:
> > > > > 1. download artifacts from Apache dist repo;
> > > > > 2. the signature looks good;
> > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > /example/quantization/.
> > > > >
> > > > > thanks,
> > > > > -tao
> > > > >
> > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv  wrote:
> > > > >
> > > > > > I see. I was looking at this page:
> > > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > >
> > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> ptre...@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Tao,
> > > > > >>
> > > > > >> Could you tell me where did you look for it and did not find
> it? I
> > > > just
> > > > > >> checked and both
> > > > > >>
> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > and
> > > > > >> draft of the release on GitHub have them.
> > > > > >>
> > > > > >> Thank you
> > > > > >> Przemek
> > > > > >>
> > > > > >> On 2020/02/01 14:23:11, Tao Lv  wrote:
> > > > > >> > It seems the src tar and signature are missing from the tag.
> > > > > >> >
> > > > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak <
> > > > > ptre...@apache.org>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Dear MXNet community,
> > > > > >> > >
> > > > > >> > > This is the vote to release Apache MXNet (incubating)
> version
> > > > 1.6.0.
> > > > > >> > > Voting starts today and will close on Monday 2/3/2020 23:59
> PST.
> > > > > >> > >
> > > > > >> > > Link to release notes:
> > > > > >> > >
> > > > >
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > >> > >
> > > > > >> > > Link to release candidate:
> > > > > >> > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > >> > >
> > > > > >> > > Link to source and signatures on apache dist server:
> > > > > >> > >
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > >> > >
> > > > > >> > > The differences comparing to previous release candidate
> > > 1.6.0.rc1:
> > > > > >> > >  * Fixes for license issues (#17361, #17375, #17370, #17460)
> > > > > >> > >  * Bugfix for saving LSTM layer parameter (#17288)
> > > > > >> > >  * Bugfix for downloading the model from model zoo from
> multiple
> > > > > >> processes
> > > > > >> > > (#17372)
> > > > > >> > >  * Fixed a symbol.py in AMP for GluonNLP (#17408)
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Please remember to TEST first before voting accordingly:
> > > > > >> > > +1 = approve
> > > > > >> > > +0 = no opinion
> > > > > >> > > -1 = disapprove (provide reason)
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > Best regards,
> > > > > >> > > Przemyslaw Tredak
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Chaitanya Prakash Bapat*
> > > *+1 (973) 953-6299*
> > >
> > > [image: https://www.linkedin.com//in/chaibapat25]
> > > [image:
> https://www.facebook.com/chaibapat
> > > ]
> > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Chris Olivier
When "fixing", please "fix" through actual root-cause analysis (use gdb,
for instance) and not simply by guesswork and cutting out things which
probably aren't actually at fault (blaming an OMP library that's in
worldwide distribution int he billions should be treated with great
skepticism).

On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan  wrote:

> Pedro,
>
> While I agree with you we need to fix this usability issue, I don't think
> this is a release blocker as Przemek mentioned above. Could we fix this in
> the next minor release?
>
> Thanks,
>
> Lin
>
> On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy  >
> wrote:
>
> > Right. Would it be possible to have the CMake build also use libgomp for
> > consistency with the releases until these issues are resolved?
> > This can affect anyone compiling the distribution with CMake and also
> > happens randomly in CI, worsening the contributor experience due to CI
> > failures.
> >
> > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> > wrote:
> >
> > > Hi Pedro,
> > >
> > > From the issue that you linked it seems that you are using the LLVM
> > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > that's
> > > what seems to be the conclusion from this issue:
> > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > >
> > > Przemek
> > >
> > > On 2020/02/04 03:42:30, Pedro Larroy 
> > > wrote:
> > > > -1
> > > >
> > > > Unit tests passed in CPU build.
> > > >
> > > > I observe crashes related to openmp using cpp unit tests:
> > > >
> > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > >
> > > > Pedro.
> > > >
> > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat  >
> > > wrote:
> > > >
> > > > > +1
> > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > Tested for OpPerf utility
> > > > > For CPU -
> > > > >
> https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > >
> > > > > Works well!
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Tested Horovod with mnist example. My compiler flags are below:
> > > > > >
> > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> > > CPU_SSE2,
> > > > > ✔
> > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > > > CPU_AVX2, ✔
> > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > > > BLAS_MKL, ✖
> > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> > > DEBUG, ✖
> > > > > > TVM_OP]
> > > > > >
> > > > > > Lin
> > > > > >
> > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > I tested below items:
> > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > 2. the signature looks good;
> > > > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > /example/quantization/.
> > > > > > >
> > > > > > > thanks,
> > > > > > > -tao
> > > > > > >
> > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv 
> wrote:
> > > > > > >
> > > > > > > > I see. I was looking at this page:
> > > > > > > >
> > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > >
> > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> > > ptre...@apache.org
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Tao,
> > > > > > > >>
> > > > > > > >> Could you tell me where did you look for it and did not find
> > > it? I
> > > > > > just
> > > > > > > >> checked and both
> > > > > > > >>
> > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > and
> > > > > > > >> draft of the release on GitHub have them.
> > > > > > > >>
> > > > > > > >> Thank you
> > > > > > > >> Przemek
> > > > > > > >>
> > > > > > > >> On 2020/02/01 14:23:11, Tao Lv  wrote:
> > > > > > > >> > It seems the src tar and signature are missing from the
> tag.
> > > > > > > >> >
> > > > > > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak <
> > > > > > > ptre...@apache.org>
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > Dear MXNet community,
> > > > > > > >> > >
> > > > > > > >> > > This is the vote to release Apache MXNet (incubating)
> > > version
> > > > > > 1.6.0.
> > > > > > > >> > > Voting starts today and will close on Monday 2/3/2020
> > 23:59
> > > PST.
> > > > > > > >> > >
> > > > > > > >> > > Link to release notes:
> > > > > > > >> > >
> > > > > > >
> > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > > >> > >
> > > > > > > >> > > Link to release candidate:
> > > > > > > >> > >
> > > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > >> > >
> > > > > > > >> > > Link to source and signatures on apache dist server:
> > > > > > > >> > >
> > > > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lin Yuan
Pedro,

While I agree with you we need to fix this usability issue, I don't think
this is a release blocker as Przemek mentioned above. Could we fix this in
the next minor release?

Thanks,

Lin

On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy 
wrote:

> Right. Would it be possible to have the CMake build also use libgomp for
> consistency with the releases until these issues are resolved?
> This can affect anyone compiling the distribution with CMake and also
> happens randomly in CI, worsening the contributor experience due to CI
> failures.
>
> On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> wrote:
>
> > Hi Pedro,
> >
> > From the issue that you linked it seems that you are using the LLVM
> > OpenMP, whereas I believe the actual release uses libgomp (at least
> that's
> > what seems to be the conclusion from this issue:
> > https://github.com/apache/incubator-mxnet/issues/16891)?
> >
> > Przemek
> >
> > On 2020/02/04 03:42:30, Pedro Larroy 
> > wrote:
> > > -1
> > >
> > > Unit tests passed in CPU build.
> > >
> > > I observe crashes related to openmp using cpp unit tests:
> > >
> > > https://github.com/apache/incubator-mxnet/issues/17043
> > >
> > > Pedro.
> > >
> > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat 
> > wrote:
> > >
> > > > +1
> > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > Tested for OpPerf utility
> > > > For CPU -
> > > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > >
> > > > Works well!
> > > >
> > > >
> > > >
> > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Tested Horovod with mnist example. My compiler flags are below:
> > > > >
> > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> > CPU_SSE2,
> > > > ✔
> > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > > CPU_AVX2, ✔
> > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > > BLAS_MKL, ✖
> > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> > DEBUG, ✖
> > > > > TVM_OP]
> > > > >
> > > > > Lin
> > > > >
> > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > I tested below items:
> > > > > > 1. download artifacts from Apache dist repo;
> > > > > > 2. the signature looks good;
> > > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > /example/quantization/.
> > > > > >
> > > > > > thanks,
> > > > > > -tao
> > > > > >
> > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv  wrote:
> > > > > >
> > > > > > > I see. I was looking at this page:
> > > > > > >
> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > >
> > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> > ptre...@apache.org
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Tao,
> > > > > > >>
> > > > > > >> Could you tell me where did you look for it and did not find
> > it? I
> > > > > just
> > > > > > >> checked and both
> > > > > > >>
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > and
> > > > > > >> draft of the release on GitHub have them.
> > > > > > >>
> > > > > > >> Thank you
> > > > > > >> Przemek
> > > > > > >>
> > > > > > >> On 2020/02/01 14:23:11, Tao Lv  wrote:
> > > > > > >> > It seems the src tar and signature are missing from the tag.
> > > > > > >> >
> > > > > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak <
> > > > > > ptre...@apache.org>
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > Dear MXNet community,
> > > > > > >> > >
> > > > > > >> > > This is the vote to release Apache MXNet (incubating)
> > version
> > > > > 1.6.0.
> > > > > > >> > > Voting starts today and will close on Monday 2/3/2020
> 23:59
> > PST.
> > > > > > >> > >
> > > > > > >> > > Link to release notes:
> > > > > > >> > >
> > > > > >
> > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
> > > > > > >> > >
> > > > > > >> > > Link to release candidate:
> > > > > > >> > >
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > >> > >
> > > > > > >> > > Link to source and signatures on apache dist server:
> > > > > > >> > >
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > >> > >
> > > > > > >> > > The differences comparing to previous release candidate
> > > > 1.6.0.rc1:
> > > > > > >> > >  * Fixes for license issues (#17361, #17375, #17370,
> #17460)
> > > > > > >> > >  * Bugfix for saving LSTM layer parameter (#17288)
> > > > > > >> > >  * Bugfix for downloading the model from model zoo from
> > multiple
> > > > > > >> processes
> > > > > > >> > > (#17372)
> > > > > > >> > >  * Fixed a symbol.py in AMP for GluonNLP (#17408)
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > Please remember to TEST first before voting 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Hi Chris,

you previously found and fixed a OMP race condition during fork at 
https://github.com/apache/incubator-mxnet/pull/17039

This time no forks are involved. Could you run the following reproducer on
master branch:

  git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
  cd mxnet
  git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
https://github.com/apache/incubator-mxnet/issues/17514
  mkdir build; cd build;
  cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
-DUSE_CUDA=OFF ..
  ninja
  ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce


As you are OpenMP expert, you may be able to identify the root cause withe
relative ease.

Thank you,

Leonard

On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> When "fixing", please "fix" through actual root-cause analysis (use gdb,
> for instance) and not simply by guesswork and cutting out things which
> probably aren't actually at fault (blaming an OMP library that's in
> worldwide distribution int he billions should be treated with great
> skepticism).
> 
> On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan  wrote:
> 
> > Pedro,
> > 
> > While I agree with you we need to fix this usability issue, I don't think
> > this is a release blocker as Przemek mentioned above. Could we fix this in
> > the next minor release?
> > 
> > Thanks,
> > 
> > Lin
> > 
> > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy  > wrote:
> > 
> > > Right. Would it be possible to have the CMake build also use libgomp for
> > > consistency with the releases until these issues are resolved?
> > > This can affect anyone compiling the distribution with CMake and also
> > > happens randomly in CI, worsening the contributor experience due to CI
> > > failures.
> > > 
> > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> > > wrote:
> > > 
> > > > Hi Pedro,
> > > > 
> > > > From the issue that you linked it seems that you are using the LLVM
> > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > that's
> > > > what seems to be the conclusion from this issue:
> > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > 
> > > > Przemek
> > > > 
> > > > On 2020/02/04 03:42:30, Pedro Larroy 
> > > > wrote:
> > > > > -1
> > > > > 
> > > > > Unit tests passed in CPU build.
> > > > > 
> > > > > I observe crashes related to openmp using cpp unit tests:
> > > > > 
> > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > > 
> > > > > Pedro.
> > > > > 
> > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat  > > > wrote:
> > > > > > +1
> > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > Tested for OpPerf utility
> > > > > > For CPU -
> > > > > > 
> > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > > Works well!
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> > > > > > 
> > > > > > > +1
> > > > > > > 
> > > > > > > Tested Horovod with mnist example. My compiler flags are below:
> > > > > > > 
> > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> > > > CPU_SSE2,
> > > > > > ✔
> > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > > > > CPU_AVX2, ✔
> > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > > > > BLAS_MKL, ✖
> > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> > > > DEBUG, ✖
> > > > > > > TVM_OP]
> > > > > > > 
> > > > > > > Lin
> > > > > > > 
> > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > > > > > > 
> > > > > > > > +1
> > > > > > > > 
> > > > > > > > I tested below items:
> > > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > > 2. the signature looks good;
> > > > > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > > /example/quantization/.
> > > > > > > > thanks,
> > > > > > > > -tao
> > > > > > > > 
> > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv 
> > wrote:
> > > > > > > > > I see. I was looking at this page:
> > > > > > > > > 
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> > > > ptre...@apache.org
> > > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > > Hi Tao,
> > > > > > > > > > 
> > > > > > > > > > Could you tell me where did you look for it and did not find
> > > > it? I
> > > > > > > just
> > > > > > > > > > checked and both
> > > > > > > > > > 
> > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
> > > > > > and
> > > > > > > > > > draft of the release on GitHub have them.
> > > > > > > > > > 
> > > > > > > > > > Thank you
> > > > > > > > > > Przemek
> > > > > > > > > > 
> > > > > > > > > > On 2020/02/01 14:23:11, Tao Lv  

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Using latest upstream jemalloc 
https://github.com/leezu/mxnet/commit/fd4c78a635087f6164344da53a55ba2b67da2fd2
fixes the issue. 

However, there were concerns that this commit relies on unreleased development
features of jemalloc (jemalloc cmake build system support) and we'll not merge
this commit until upstream releases cmake build system support in a release.

In the meantime anyone is welcome to work on an equivalent patch based on the
custom build system in latest stable jemalloc. 

On Tue, 2020-02-04 at 22:46 +, Lausen, Leonard wrote:
> Bisect identifies 
> https://github.com/apache/incubator-mxnet/commit/425319cb59904573bd3fe1b6fe0a7381eceb9bbd
> 
> Thus this is an issue with jemalloc + llvm libopemnp.
> 
> The correct reproducer for latest master branch is
> 
> 
>   git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
>   cd mxnet
>   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
> https://github.com/apache/incubator-mxnet/issues/17514
>   mkdir build; cd build;
>   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> -DUSE_CUDA=OFF -DUSE_JEMALLOC=ON ..
>   ninja
>   ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce
> 
> Let's move the discussion to about fixing the jemalloc, openmp incompatibility
> to https://github.com/apache/incubator-mxnet/issues/17043 
> 
> 
> 
> @Chris, could you look into this issue as it only happens with LLVM OpenMP?
> 
> 
> 
> @Przemek: For 1.6.0 releas notes I suggest include recommendation to set
> USE_JEMALLOC=OFF when compiling from source.
> 
> This note should probably be added in any case, as building with
> USE_JEMALLOC=ON
> is broken on Ubuntu Ubuntu 18.10 and higher, as well as Debian Stable.
> 
> Given these release notes, +1 for the release.
> 
> 
> Best regards
> Leonard
> 
> On Tue, 2020-02-04 at 22:26 +, Lausen, Leonard wrote:
> > Actually below reproducer is wrong. The issue was apparently fixed on master
> > recently. I'm running an automated bisect and will report the result later.
> > 
> > On Tue, 2020-02-04 at 21:44 +, Lausen, Leonard wrote:
> > > Hi Chris,
> > > 
> > > you previously found and fixed a OMP race condition during fork at 
> > > https://github.com/apache/incubator-mxnet/pull/17039
> > > 
> > > This time no forks are involved. Could you run the following reproducer on
> > > master branch:
> > > 
> > >   git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
> > >   cd mxnet
> > >   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
> > > https://github.com/apache/incubator-mxnet/issues/17514
> > >   mkdir build; cd build;
> > >   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> > > -DUSE_CUDA=OFF ..
> > >   ninja
> > >   ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce
> > > 
> > > 
> > > As you are OpenMP expert, you may be able to identify the root cause withe
> > > relative ease.
> > > 
> > > Thank you,
> > > 
> > > Leonard
> > > 
> > > On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> > > > When "fixing", please "fix" through actual root-cause analysis (use gdb,
> > > > for instance) and not simply by guesswork and cutting out things which
> > > > probably aren't actually at fault (blaming an OMP library that's in
> > > > worldwide distribution int he billions should be treated with great
> > > > skepticism).
> > > > 
> > > > On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan  wrote:
> > > > 
> > > > > Pedro,
> > > > > 
> > > > > While I agree with you we need to fix this usability issue, I don't
> > > > > think
> > > > > this is a release blocker as Przemek mentioned above. Could we fix
> > > > > this
> > > > > in
> > > > > the next minor release?
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Lin
> > > > > 
> > > > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <
> > > > > pedro.larroy.li...@gmail.com
> > > > > wrote:
> > > > > 
> > > > > > Right. Would it be possible to have the CMake build also use libgomp
> > > > > > for
> > > > > > consistency with the releases until these issues are resolved?
> > > > > > This can affect anyone compiling the distribution with CMake and
> > > > > > also
> > > > > > happens randomly in CI, worsening the contributor experience due to
> > > > > > CI
> > > > > > failures.
> > > > > > 
> > > > > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak  > > > > > >
> > > > > > wrote:
> > > > > > 
> > > > > > > Hi Pedro,
> > > > > > > 
> > > > > > > From the issue that you linked it seems that you are using the
> > > > > > > LLVM
> > > > > > > OpenMP, whereas I believe the actual release uses libgomp (at
> > > > > > > least
> > > > > > that's
> > > > > > > what seems to be the conclusion from this issue:
> > > > > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > > > > 
> > > > > > > Przemek
> > > > > > > 
> > > > > > > On 2020/02/04 03:42:30, Pedro Larroy  > > > > > > >
> > > > > > > wrote:
> > > > > > > > -1
> > > > > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
@Chris: If you actually go and read the issue that I linked above, you can
see that I was using gdb. Maybe you can have a look into the issue if you
have an idea to fix. The backtrace points to a segfault in the omp library.
While the cause could be somewhere else which is causing undefined
behaviour, taking into consideration that this is not happening with
libgomp and other engineers believe that mixing openmp implementations at
runtime can cause UB, it's reasonable to believe that there's a good chance
that is related to this. I personally don't have time to investigate this
further, as I don't think introducing this dependency is worth the trouble
is causing, when the one provided by the platform works well enough.

0x743b284a in __kmp_fork_call () from
/home/piotr/mxnet/build/3rdparty/openmp/runtime/src/libomp.so
(gdb) bt


@Lin: I personally wouldn't be comfortable releasing a version that
segfaults, I don't think that meets the quality bar. but this is up to the
community to decide, I'm only reporting what I observe.

Releasing with indications of this kind of problems causes issues later in
downstream projects and running services.

On Tue, Feb 4, 2020 at 11:07 AM Chris Olivier  wrote:

> When "fixing", please "fix" through actual root-cause analysis (use gdb,
> for instance) and not simply by guesswork and cutting out things which
> probably aren't actually at fault (blaming an OMP library that's in
> worldwide distribution int he billions should be treated with great
> skepticism).
>
> On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan  wrote:
>
> > Pedro,
> >
> > While I agree with you we need to fix this usability issue, I don't think
> > this is a release blocker as Przemek mentioned above. Could we fix this
> in
> > the next minor release?
> >
> > Thanks,
> >
> > Lin
> >
> > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <
> pedro.larroy.li...@gmail.com
> > >
> > wrote:
> >
> > > Right. Would it be possible to have the CMake build also use libgomp
> for
> > > consistency with the releases until these issues are resolved?
> > > This can affect anyone compiling the distribution with CMake and also
> > > happens randomly in CI, worsening the contributor experience due to CI
> > > failures.
> > >
> > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> > > wrote:
> > >
> > > > Hi Pedro,
> > > >
> > > > From the issue that you linked it seems that you are using the LLVM
> > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > that's
> > > > what seems to be the conclusion from this issue:
> > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > >
> > > > Przemek
> > > >
> > > > On 2020/02/04 03:42:30, Pedro Larroy 
> > > > wrote:
> > > > > -1
> > > > >
> > > > > Unit tests passed in CPU build.
> > > > >
> > > > > I observe crashes related to openmp using cpp unit tests:
> > > > >
> > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > >
> > > > > Pedro.
> > > > >
> > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat <
> chai.ba...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > +1
> > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > Tested for OpPerf utility
> > > > > > For CPU -
> > > > > >
> > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > >
> > > > > > Works well!
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan 
> wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Tested Horovod with mnist example. My compiler flags are below:
> > > > > > >
> > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> > > > CPU_SSE2,
> > > > > > ✔
> > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > > > > CPU_AVX2, ✔
> > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > > > > BLAS_MKL, ✖
> > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER,
> ✔
> > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> > > > DEBUG, ✖
> > > > > > > TVM_OP]
> > > > > > >
> > > > > > > Lin
> > > > > > >
> > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv 
> wrote:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > > I tested below items:
> > > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > > 2. the signature looks good;
> > > > > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > > /example/quantization/.
> > > > > > > >
> > > > > > > > thanks,
> > > > > > > > -tao
> > > > > > > >
> > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv 
> > wrote:
> > > > > > > >
> > > > > > > > > I see. I was looking at this page:
> > > > > > > > >
> > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > >
> > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> > > > ptre...@apache.org
> > > > > > >
> > > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
Hi Przemek

I'm fine if we add it to the release notes and try to fix it for the next
release. Changing my vote to +1

Pedro.

On Mon, Feb 3, 2020 at 7:42 PM Pedro Larroy 
wrote:

>
> -1
>
> Unit tests passed in CPU build.
>
> I observe crashes related to openmp using cpp unit tests:
>
> https://github.com/apache/incubator-mxnet/issues/17043
>
> Pedro.
>
> On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat 
> wrote:
>
>> +1
>> Successfully built MXNet 1.6.0rc2 on Linux
>> Tested for OpPerf utility
>> For CPU -
>> https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
>>
>> Works well!
>>
>>
>>
>> On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
>>
>> > +1
>> >
>> > Tested Horovod with mnist example. My compiler flags are below:
>> >
>> > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
>> CPU_SSE2, ✔
>> > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
>> CPU_AVX2, ✔
>> > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
>> BLAS_MKL, ✖
>> > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
>> > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖ DEBUG, ✖
>> > TVM_OP]
>> >
>> > Lin
>> >
>> > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
>> >
>> > > +1
>> > >
>> > > I tested below items:
>> > > 1. download artifacts from Apache dist repo;
>> > > 2. the signature looks good;
>> > > 3. build from source code with MKL-DNN and MKL on centos;
>> > > 4. run fp32 and int8 inference of ResNet50 under
>> /example/quantization/.
>> > >
>> > > thanks,
>> > > -tao
>> > >
>> > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv  wrote:
>> > >
>> > > > I see. I was looking at this page:
>> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
>> > > >
>> > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
>> ptre...@apache.org>
>> > > > wrote:
>> > > >
>> > > >> Hi Tao,
>> > > >>
>> > > >> Could you tell me where did you look for it and did not find it? I
>> > just
>> > > >> checked and both
>> > > >> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
>> and
>> > > >> draft of the release on GitHub have them.
>> > > >>
>> > > >> Thank you
>> > > >> Przemek
>> > > >>
>> > > >> On 2020/02/01 14:23:11, Tao Lv  wrote:
>> > > >> > It seems the src tar and signature are missing from the tag.
>> > > >> >
>> > > >> > On Fri, Jan 31, 2020 at 11:09 AM Przemysław Trędak <
>> > > ptre...@apache.org>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Dear MXNet community,
>> > > >> > >
>> > > >> > > This is the vote to release Apache MXNet (incubating) version
>> > 1.6.0.
>> > > >> > > Voting starts today and will close on Monday 2/3/2020 23:59
>> PST.
>> > > >> > >
>> > > >> > > Link to release notes:
>> > > >> > >
>> > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+notes
>> > > >> > >
>> > > >> > > Link to release candidate:
>> > > >> > >
>> https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
>> > > >> > >
>> > > >> > > Link to source and signatures on apache dist server:
>> > > >> > >
>> https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.6.0.rc2/
>> > > >> > >
>> > > >> > > The differences comparing to previous release candidate
>> 1.6.0.rc1:
>> > > >> > >  * Fixes for license issues (#17361, #17375, #17370, #17460)
>> > > >> > >  * Bugfix for saving LSTM layer parameter (#17288)
>> > > >> > >  * Bugfix for downloading the model from model zoo from
>> multiple
>> > > >> processes
>> > > >> > > (#17372)
>> > > >> > >  * Fixed a symbol.py in AMP for GluonNLP (#17408)
>> > > >> > >
>> > > >> > >
>> > > >> > > Please remember to TEST first before voting accordingly:
>> > > >> > > +1 = approve
>> > > >> > > +0 = no opinion
>> > > >> > > -1 = disapprove (provide reason)
>> > > >> > >
>> > > >> > >
>> > > >> > > Best regards,
>> > > >> > > Przemyslaw Tredak
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>>
>> --
>> *Chaitanya Prakash Bapat*
>> *+1 (973) 953-6299*
>>
>> [image: https://www.linkedin.com//in/chaibapat25]
>> [image:
>> https://www.facebook.com/chaibapat]
>> [image:
>> https://twitter.com/ChaiBapchya] [image:
>> https://www.linkedin.com//in/chaibapat25]
>> 
>>
>


Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Actually below reproducer is wrong. The issue was apparently fixed on master
recently. I'm running an automated bisect and will report the result later.

On Tue, 2020-02-04 at 21:44 +, Lausen, Leonard wrote:
> Hi Chris,
> 
> you previously found and fixed a OMP race condition during fork at 
> https://github.com/apache/incubator-mxnet/pull/17039
> 
> This time no forks are involved. Could you run the following reproducer on
> master branch:
> 
>   git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
>   cd mxnet
>   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
> https://github.com/apache/incubator-mxnet/issues/17514
>   mkdir build; cd build;
>   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> -DUSE_CUDA=OFF ..
>   ninja
>   ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce
> 
> 
> As you are OpenMP expert, you may be able to identify the root cause withe
> relative ease.
> 
> Thank you,
> 
> Leonard
> 
> On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> > When "fixing", please "fix" through actual root-cause analysis (use gdb,
> > for instance) and not simply by guesswork and cutting out things which
> > probably aren't actually at fault (blaming an OMP library that's in
> > worldwide distribution int he billions should be treated with great
> > skepticism).
> > 
> > On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan  wrote:
> > 
> > > Pedro,
> > > 
> > > While I agree with you we need to fix this usability issue, I don't think
> > > this is a release blocker as Przemek mentioned above. Could we fix this in
> > > the next minor release?
> > > 
> > > Thanks,
> > > 
> > > Lin
> > > 
> > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy  > > wrote:
> > > 
> > > > Right. Would it be possible to have the CMake build also use libgomp for
> > > > consistency with the releases until these issues are resolved?
> > > > This can affect anyone compiling the distribution with CMake and also
> > > > happens randomly in CI, worsening the contributor experience due to CI
> > > > failures.
> > > > 
> > > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> > > > wrote:
> > > > 
> > > > > Hi Pedro,
> > > > > 
> > > > > From the issue that you linked it seems that you are using the LLVM
> > > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > > that's
> > > > > what seems to be the conclusion from this issue:
> > > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > > 
> > > > > Przemek
> > > > > 
> > > > > On 2020/02/04 03:42:30, Pedro Larroy 
> > > > > wrote:
> > > > > > -1
> > > > > > 
> > > > > > Unit tests passed in CPU build.
> > > > > > 
> > > > > > I observe crashes related to openmp using cpp unit tests:
> > > > > > 
> > > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > > > 
> > > > > > Pedro.
> > > > > > 
> > > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat  > > > > wrote:
> > > > > > > +1
> > > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > > Tested for OpPerf utility
> > > > > > > For CPU -
> > > > > > > 
> > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > > > Works well!
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan  wrote:
> > > > > > > 
> > > > > > > > +1
> > > > > > > > 
> > > > > > > > Tested Horovod with mnist example. My compiler flags are below:
> > > > > > > > 
> > > > > > > > [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✔ CPU_SSE, ✔
> > > > > CPU_SSE2,
> > > > > > > ✔
> > > > > > > > CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖
> > > > > > > CPU_AVX2, ✔
> > > > > > > > OPENMP, ✖ SSE, ✔ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖
> > > > > > > BLAS_MKL, ✖
> > > > > > > > BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✔
> > > > > > > > DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✖ SIGNAL_HANDLER, ✖
> > > > > DEBUG, ✖
> > > > > > > > TVM_OP]
> > > > > > > > 
> > > > > > > > Lin
> > > > > > > > 
> > > > > > > > On Sat, Feb 1, 2020 at 9:55 PM Tao Lv  wrote:
> > > > > > > > 
> > > > > > > > > +1
> > > > > > > > > 
> > > > > > > > > I tested below items:
> > > > > > > > > 1. download artifacts from Apache dist repo;
> > > > > > > > > 2. the signature looks good;
> > > > > > > > > 3. build from source code with MKL-DNN and MKL on centos;
> > > > > > > > > 4. run fp32 and int8 inference of ResNet50 under
> > > > > > > /example/quantization/.
> > > > > > > > > thanks,
> > > > > > > > > -tao
> > > > > > > > > 
> > > > > > > > > On Sun, Feb 2, 2020 at 11:00 AM Tao Lv 
> > > wrote:
> > > > > > > > > > I see. I was looking at this page:
> > > > > > > > > > 
> > > > https://github.com/apache/incubator-mxnet/releases/tag/1.6.0.rc2
> > > > > > > > > > On Sun, Feb 2, 2020 at 4:54 AM Przemysław Trędak <
> > > > > ptre...@apache.org
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > Hi Tao,
> > > > > > > > > 

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Bisect identifies 
https://github.com/apache/incubator-mxnet/commit/425319cb59904573bd3fe1b6fe0a7381eceb9bbd

Thus this is an issue with jemalloc + llvm libopemnp.

The correct reproducer for latest master branch is


  git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
  cd mxnet
  git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
https://github.com/apache/incubator-mxnet/issues/17514
  mkdir build; cd build;
  cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
-DUSE_CUDA=OFF -DUSE_JEMALLOC=ON ..
  ninja
  ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce

Let's move the discussion to about fixing the jemalloc, openmp incompatibility
to https://github.com/apache/incubator-mxnet/issues/17043 



@Chris, could you look into this issue as it only happens with LLVM OpenMP?



@Przemek: For 1.6.0 releas notes I suggest include recommendation to set
USE_JEMALLOC=OFF when compiling from source.

This note should probably be added in any case, as building with USE_JEMALLOC=ON
is broken on Ubuntu Ubuntu 18.10 and higher, as well as Debian Stable.

Given these release notes, +1 for the release.


Best regards
Leonard

On Tue, 2020-02-04 at 22:26 +, Lausen, Leonard wrote:
> Actually below reproducer is wrong. The issue was apparently fixed on master
> recently. I'm running an automated bisect and will report the result later.
> 
> On Tue, 2020-02-04 at 21:44 +, Lausen, Leonard wrote:
> > Hi Chris,
> > 
> > you previously found and fixed a OMP race condition during fork at 
> > https://github.com/apache/incubator-mxnet/pull/17039
> > 
> > This time no forks are involved. Could you run the following reproducer on
> > master branch:
> > 
> >   git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet
> >   cd mxnet
> >   git checkout a726c406964b9cd17efa826738a662e09d973972 # workaround 
> > https://github.com/apache/incubator-mxnet/issues/17514
> >   mkdir build; cd build;
> >   cmake -DUSE_CPP_PACKAGE=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo -GNinja
> > -DUSE_CUDA=OFF ..
> >   ninja
> >   ./cpp-package/example/test_regress_label  # run a 2-3 times to reproduce
> > 
> > 
> > As you are OpenMP expert, you may be able to identify the root cause withe
> > relative ease.
> > 
> > Thank you,
> > 
> > Leonard
> > 
> > On Tue, 2020-02-04 at 11:06 -0800, Chris Olivier wrote:
> > > When "fixing", please "fix" through actual root-cause analysis (use gdb,
> > > for instance) and not simply by guesswork and cutting out things which
> > > probably aren't actually at fault (blaming an OMP library that's in
> > > worldwide distribution int he billions should be treated with great
> > > skepticism).
> > > 
> > > On Tue, Feb 4, 2020 at 10:44 AM Lin Yuan  wrote:
> > > 
> > > > Pedro,
> > > > 
> > > > While I agree with you we need to fix this usability issue, I don't
> > > > think
> > > > this is a release blocker as Przemek mentioned above. Could we fix this
> > > > in
> > > > the next minor release?
> > > > 
> > > > Thanks,
> > > > 
> > > > Lin
> > > > 
> > > > On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy <
> > > > pedro.larroy.li...@gmail.com
> > > > wrote:
> > > > 
> > > > > Right. Would it be possible to have the CMake build also use libgomp
> > > > > for
> > > > > consistency with the releases until these issues are resolved?
> > > > > This can affect anyone compiling the distribution with CMake and also
> > > > > happens randomly in CI, worsening the contributor experience due to CI
> > > > > failures.
> > > > > 
> > > > > On Tue, Feb 4, 2020 at 9:33 AM Przemysław Trędak 
> > > > > wrote:
> > > > > 
> > > > > > Hi Pedro,
> > > > > > 
> > > > > > From the issue that you linked it seems that you are using the LLVM
> > > > > > OpenMP, whereas I believe the actual release uses libgomp (at least
> > > > > that's
> > > > > > what seems to be the conclusion from this issue:
> > > > > > https://github.com/apache/incubator-mxnet/issues/16891)?
> > > > > > 
> > > > > > Przemek
> > > > > > 
> > > > > > On 2020/02/04 03:42:30, Pedro Larroy 
> > > > > > wrote:
> > > > > > > -1
> > > > > > > 
> > > > > > > Unit tests passed in CPU build.
> > > > > > > 
> > > > > > > I observe crashes related to openmp using cpp unit tests:
> > > > > > > 
> > > > > > > https://github.com/apache/incubator-mxnet/issues/17043
> > > > > > > 
> > > > > > > Pedro.
> > > > > > > 
> > > > > > > On Mon, Feb 3, 2020 at 6:44 PM Chaitanya Bapat <
> > > > > > > chai.ba...@gmail.com
> > > > > > wrote:
> > > > > > > > +1
> > > > > > > > Successfully built MXNet 1.6.0rc2 on Linux
> > > > > > > > Tested for OpPerf utility
> > > > > > > > For CPU -
> > > > > > > > 
> > > > https://gist.github.com/ChaiBapchya/d5ecc3e971c5a3c558d672477b4b6b9c
> > > > > > > > Works well!
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Mon, 3 Feb 2020 at 15:43, Lin Yuan 
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > > +1
> > > > > > > > > 
> > > > > > > > > Tested Horovod 

Cuda 10.2 Wheels

2020-02-04 Thread Alfredo Luque
Hi folks,

Are there any blockers on releasing CUDA 10.2 compatible wheels? Based on this
readme

the
packages should be available on PyPi already but they don’t appear to exist
yet.

On the other thread, someone posted this static page
 that has
nightly builds hosted on S3 but it appears CUDA 10.2 wheels aren’t on there.

—
Alfredo Luque
Software Engineer
Machine Learning Infrastructure
Airbnb
San Francisco, CA