Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Przemysław Trędak
Hi Pedro, >From the issue that you linked it seems that you are using the LLVM OpenMP, >whereas I believe the actual release uses libgomp (at least that's what seems >to be the conclusion from this issue: >https://github.com/apache/incubator-mxnet/issues/16891)? Przemek On 2020/02/04

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
Right. Would it be possible to have the CMake build also use libgomp for consistency with the releases until these issues are resolved? This can affect anyone compiling the distribution with CMake and also happens randomly in CI, worsening the contributor experience due to CI failures. On Tue,

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Chris Olivier
When "fixing", please "fix" through actual root-cause analysis (use gdb, for instance) and not simply by guesswork and cutting out things which probably aren't actually at fault (blaming an OMP library that's in worldwide distribution int he billions should be treated with great skepticism). On

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lin Yuan
Pedro, While I agree with you we need to fix this usability issue, I don't think this is a release blocker as Przemek mentioned above. Could we fix this in the next minor release? Thanks, Lin On Tue, Feb 4, 2020 at 10:38 AM Pedro Larroy wrote: > Right. Would it be possible to have the CMake

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Hi Chris, you previously found and fixed a OMP race condition during fork at https://github.com/apache/incubator-mxnet/pull/17039 This time no forks are involved. Could you run the following reproducer on master branch: git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Using latest upstream jemalloc https://github.com/leezu/mxnet/commit/fd4c78a635087f6164344da53a55ba2b67da2fd2 fixes the issue. However, there were concerns that this commit relies on unreleased development features of jemalloc (jemalloc cmake build system support) and we'll not merge this

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
@Chris: If you actually go and read the issue that I linked above, you can see that I was using gdb. Maybe you can have a look into the issue if you have an idea to fix. The backtrace points to a segfault in the omp library. While the cause could be somewhere else which is causing undefined

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Pedro Larroy
Hi Przemek I'm fine if we add it to the release notes and try to fix it for the next release. Changing my vote to +1 Pedro. On Mon, Feb 3, 2020 at 7:42 PM Pedro Larroy wrote: > > -1 > > Unit tests passed in CPU build. > > I observe crashes related to openmp using cpp unit tests: > >

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Actually below reproducer is wrong. The issue was apparently fixed on master recently. I'm running an automated bisect and will report the result later. On Tue, 2020-02-04 at 21:44 +, Lausen, Leonard wrote: > Hi Chris, > > you previously found and fixed a OMP race condition during fork at >

Re: [VOTE] Release Apache MXNet (incubating) version 1.6.0.rc2

2020-02-04 Thread Lausen, Leonard
Bisect identifies https://github.com/apache/incubator-mxnet/commit/425319cb59904573bd3fe1b6fe0a7381eceb9bbd Thus this is an issue with jemalloc + llvm libopemnp. The correct reproducer for latest master branch is git clone --recursive https://github.com/apache/incubator-mxnet/ mxnet cd

Cuda 10.2 Wheels

2020-02-04 Thread Alfredo Luque
Hi folks, Are there any blockers on releasing CUDA 10.2 compatible wheels? Based on this readme the packages should be available on PyPi already but they don’t appear to exist yet. On the other thread,