Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Marco de Abreu
Sorry for the vague phrasing, it is back to normal. This can be verified at [1]. I agree with Kellen; we will actively be working with the maintainers of dockcross to ensure their repository is brought back to a stable state which also provides proper tagging. +1 from my side now. [1]: http://jen

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread kellen sunderland
Thanks Marco for the work-arounds and for getting this fixed in CI. I personally don't see this as a release blocker as it's targeting a still experimental feature (Jetson pip wheels). I also have a pretty high level of confidence that we can fix this by working with the crossdock org. This woul

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Steffen Rochel
Should be back or is back to normal? Would you please verify and update your vote on dev@ accordingly? Currently you are on record as -1. Just trying to help Anirud to get proper vote count. Thanks Steffen (MXNet contributor hat on) On Tue, May 8, 2018 at 6:37 AM Marco de Abreu wrote: > Yes, so

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Marco de Abreu
Yes, sorry for the inconvenience! We fixed the root cause and everything should be back to normal. -Marco Steffen Rochel schrieb am Di., 8. Mai 2018, 14:59: > Marco - thanks for your efforts. Does this unblock the Apache MXNet v1.2 > release and change your vote? > > On Tue, May 8, 2018 at 3:00

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Steffen Rochel
Marco - thanks for your efforts. Does this unblock the Apache MXNet v1.2 release and change your vote? On Tue, May 8, 2018 at 3:00 AM Marco de Abreu wrote: > Small update regarding the ARM64 builds. I have created two pull requests > [1][2] which changed the repository to a mirror I created. Thi

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Marco de Abreu
Small update regarding the ARM64 builds. I have created two pull requests [1][2] which changed the repository to a mirror I created. This mirror was created using a cached version of the working Docker image, effectively reverting the state back to a working one. At the same time, this pins the con

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi Anirudh, Update: Did an install on a fresh instance with USE_MKLDNN=1, works fine now. Pip install with --pre is also working fine. Problem is the mkl-dnn I installed on the old instance. Closing the issue . Thanks! Best Regards Lai We

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi Anirudh, yes, also tried that, didn't resolve. Looking into root cause and will update. Best Regards Lai Wei https://www.linkedin.com/pub/lai-wei/2b/731/52b On Mon, May 7, 2018 at 2:15 PM, Anirudh wrote: > Hi Lai, > > I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure if this

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Anirudh
Hi Lai, I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure if this is the right flag. Did you try USE_MKLDNN=1 ? Anirudh On Mon, May 7, 2018 at 1:22 PM, Lai Wei wrote: > Hi, > > I would like to raise an issue with mxnet-mkl. The keras-mxnet package was > working fine with mxnet-mkl

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Anirudh
Hi Marco, Thanks for raising this ! Can you please elaborate on where the arm cross compilation for Jetson is documented and what is the current user impact. Can we provide this workaround to use the dockerfile before the changes in the ARM cross compilation documentation. Did you happen to verify

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi, I would like to raise an issue with mxnet-mkl. The keras-mxnet package was working fine with mxnet-mkl 1.1.0 for training on CPU. However, weights are not updated when I use mxnet-mkl 1.2.0b20180507. I tried both 'pip install mxnet-mkl --pre' and built from source from release branch (v1.2.0)

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Marco de Abreu
Sorry everybody, but it seems like our ARM64/Jetson build was just broken by the creators of our base crosscompile Dockerfile called 'dockcross'. This is one of our base images, used to cross-compile ARM64 (Jetson specifically). The owners merged the PR two days ago at [1] which led to our build-pi

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Haibin Lin
+1 binding. Build from source with CUDA, ran linear classification example and works fine. Best. Haibin On Sun, May 6, 2018 at 10:08 PM, Steffen Rochel wrote: > +1 (non-binding). Tested with selected notebooks from The Straight Dope. > So many important enhancements everybody contributed and o

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Steffen Rochel
+1 (non-binding). Tested with selected notebooks from The Straight Dope. So many important enhancements everybody contributed and our users are waiting for. Hope we will see more votes. Steffen On Mon, May 7, 2018 at 1:07 AM Anirudh wrote: > Hi all, > > Since we don't have enough binding votes ye

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Anirudh
Hi all, Since we don't have enough binding votes yet, I am extending the vote till tomorrow (Monday May 7th), 12:50 PM PDT. Anirudh On Sun, May 6, 2018 at 4:05 PM, Anirudh wrote: > Hi Pedro, > > Thanks for the clarification. I was able to reproduce the issue with > USE_OPENMP=OFF. I wasn't abl

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Anirudh
Hi Pedro, Thanks for the clarification. I was able to reproduce the issue with USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make. Since the issue is not reproducible with make and the customers using USE_OPENMP=OFF with cmake should be small, I agree with you that this should not be a

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Pedro Larroy
Agreed, I was not aware that the problems where not present in the release branch. On Fri, May 4, 2018 at 8:32 PM, Haibin Lin wrote: > I agree with Anirudh that the focus of the discussion should be limited to > the release branch, not the master branch. Anything that breaks on master > but work

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Pedro Larroy
Thank you Marco. The failures seem to be all flaky tests. Linking issue: In regards to the linking issue, for me is not a blocker, as the fix will be eventually in master and doesn't happen in the CI environment. This is an ubuntu 16.04 desktop. piotr@prodesk:130:~/devel/mxnet (v1.2.0)+$ gc

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Anirudh
Hi Marco, Thanks a lot for triggering and checking on the tests ! Anirudh On Sat, May 5, 2018 at 8:37 AM, Marco de Abreu wrote: > We had 4 out of 20 runs fail: > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/ > incubator-mxnet/detail/v1.2.0/26 > - already tracked at https://

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Anirudh
Hi Pedro, Thank you for raising this issue! I am not able to reproduce this on ubuntu 16.04 and cmake 3.5.1. Can you please provide the reproduction steps for the issue. Anirudh On Sat, May 5, 2018 at 3:12 AM, Pedro Larroy wrote: > Actually I have a linking problem in my ubuntu desktop that is

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Marco de Abreu
We had 4 out of 20 runs fail: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.2.0/26 - already tracked at https://github.com/apache/incubator-mxnet/issues/10280 since 03/27 http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/de

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Pedro Larroy
Actually I have a linking problem in my ubuntu desktop that is fixed in master: lc::ThreadedIter, std::allocator > > >::Init(std::function, std::allocator > >**)>, std::function)::{lambda()#1}&)': /usr/include/c++/5/thread:137: undefined reference to `pthread_create' 3rdparty/dmlc-core/libdmlc.a(d

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Pedro Larroy
Hi Looks like only gluon test lambda is failing intermittently, but looks like a minor numerical issue. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/ incubator-mxnet/detail/v1.2.0/20/pipeline I triggered a few builds yesterday and they all passed. I think Anirudh is right. C

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Marco de Abreu
I can start a bunch of builds. I'll send a link when they are done. -Marco Jun Wu schrieb am Sa., 5. Mai 2018, 00:10: > +1 > I built from source and ran all the model quantization examples > successfully. > > On Fri, May 4, 2018 at 3:05 PM, Anirudh wrote: > > > Hi Pedro, Haibin, Indhu, > > > >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Jun Wu
+1 I built from source and ran all the model quantization examples successfully. On Fri, May 4, 2018 at 3:05 PM, Anirudh wrote: > Hi Pedro, Haibin, Indhu, > > Thank you for your inputs on the release. I ran the test: > `test_module.py:test_forward_reshape` for 250k times with different seeds. >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Anirudh
Hi Pedro, Haibin, Indhu, Thank you for your inputs on the release. I ran the test: `test_module.py:test_forward_reshape` for 250k times with different seeds. I was unable to reproduce the issue on the release branch. If everything goes well with CI tests by Pedro running till Sunday, I think we sh

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Indhu
+1 I've been using CUDA build from this branch (built from source) on Ubuntu for couple of days now and I haven't seen any issue. The flaky tests need to be fixed but this release need not be blocked for that. On Fri, May 4, 2018 at 11:32 AM, Haibin Lin wrote: > I agree with Anirudh that the

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Haibin Lin
I agree with Anirudh that the focus of the discussion should be limited to the release branch, not the master branch. Anything that breaks on master but works on release branch should not block the release itself. Best, Haibin On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy wrote: > I see your

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
I see your point. I checked the failures on the v1.2.0 branch and I don't see segfaults, just minor failures due to flaky tests. I will trigger it repeatedly a few times until Sunday to have a and change my vote accordingly. http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/ h

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Anirudh
Hi Pedro, Thank you for the suggestions. I will try to reproduce this without fixed seeds and also run it for a longer time duration. Having said that, running unit tests over and over for a couple of days will likely cause problems because there around 42 open issues for flaky tests: https://git

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
Could you remove the fixed seeds and run it for a couple of hours with an additional loop? Also I would suggest running the unit tests over and over for a couple of days if possible. Pedro. On Thu, May 3, 2018 at 8:33 PM, Anirudh wrote: > Hi Pedro and Naveen, > > I am unable to reproduce this

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
Hi Anirudh I see too many random failures, segfaults and other problems. Qualitatively I don't think we are in a situation to make a release. For this I would expect to see master stable for most of the builds, and it's not the case right now. My vote is still -1 non binding. If someone is willi

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi all, I have added the CMake issue with USE_MKLDNN=1 to known issues in release notes: https://github.com/apache/incubator-mxnet/releases/tag/1.2.0.rc2 Anirudh On Thu, May 3, 2018 at 2:32 PM, Naveen Swamy wrote: > + 0 > > On Thu, May 3, 2018 at 12:44 PM, Anirudh wrote: > > > Hi Naveen, > >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
+ 0 On Thu, May 3, 2018 at 12:44 PM, Anirudh wrote: > Hi Naveen, > > You raise a good point and I agree that by default MKLDNN by default should > be switched off. > Because of a bug in Cmakelists.txt which has been fixed as part of #10731, > which is merged to master (but not on the release bra

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi Naveen, You raise a good point and I agree that by default MKLDNN by default should be switched off. Because of a bug in Cmakelists.txt which has been fixed as part of #10731, which is merged to master (but not on the release branch), the users won't have MKLDNN enabled even though MKLDNN is se

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Marco de Abreu
The MKLDNN tests are not really less stable than the other tests. It's pretty much the same across all tests we have. So I wouldn't say there's a need to fix them in a separate branch. On Thu, May 3, 2018 at 9:00 PM, Naveen Swamy wrote: > I also meant(but forgot to send), we stabilize it on a se

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
I also meant(but forgot to send), we stabilize it on a separate branch and then bring in the changes instead of blocking the PRs. On Thu, May 3, 2018 at 11:57 AM, Marco de Abreu < marco.g.ab...@googlemail.com> wrote: > I think the failing tests are really getting an issue. We now got roughly > 50

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Marco de Abreu
I think the failing tests are really getting an issue. We now got roughly 50 test failure related issues [1], leading to a average failure rate of 50%. Considering the costs in terms of money and time per run, this is adding up quite significantly. Didn't we just remove MKLML from our codebase to

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
USE_MKLDNN is set to ON in the cmake file by default, since its experimental can we turn OFF so there is some determinism when users build and test. https://github.com/apache/incubator-mxnet/blob/60641ef1183bb4584c9356e84b6ca6d5fce58d6d/CMakeLists.txt#L23 On a separate note, since MKLDNN is

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Correction: I was able to reproduce the issue with MKLDNN enabled on master, but not on 1.2 branch. On Thu, May 3, 2018 at 11:33 AM, Anirudh wrote: > Hi Pedro and Naveen, > > I am unable to reproduce this issue with MKLDNN on the master but not on > the 1.2.RC2 branch. > > Did the following on 1

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi Pedro and Naveen, I am unable to reproduce this issue with MKLDNN on the master but not on the 1.2.RC2 branch. Did the following on 1.2.RC2 branch: make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_DIST_KVSTORE=0 USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1 export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi Pedro and Naveen, Is this issue reproducible when MXNet is built with USE_MKLDNN=0? Also, there are a bunch of MKLDNN fixes that didn't go into the release branch. Is this issue reproducible on the release branch ? In my opinion, since we have marked MKLDNN as experimental feature for the relea

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
Thanks for raising this issue Pedro. -1(binding) We were in a similar state for a while a year ago, a lot of effort went to stabilize the tests and the CI. I have seen the PR builds are non-deterministic and you have to retry over and over (wasting resources and time) and hope you get lucky. Loo

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Pedro Larroy
-1 nondeterminisitc failures on CI master: https://issues.apache.org/jira/browse/MXNET-396 Was able to reproduce once in a fresh p3 instance with DLAMI can't reproduce consistently. On Wed, May 2, 2018 at 9:51 PM, Anirudh wrote: > Hi all, > > As part of RC2 release, we have addressed bugs and

[VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-02 Thread Anirudh
Hi all, As part of RC2 release, we have addressed bugs and some concerns that were raised. I would like to propose a vote to release Apache MXNet (incubating) version 1.2.0.RC2. Voting will start now (Wednesday, May 2nd) and end at 12:50 PM PDT, Sunday, May 6th. Link to release notes: https://cw