Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Marco de Abreu
Sorry for the vague phrasing, it is back to normal. This can be verified at [1]. I agree with Kellen; we will actively be working with the maintainers of dockcross to ensure their repository is brought back to a stable state which also provides proper tagging. +1 from my side now. [1]:

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Steffen Rochel
Should be back or is back to normal? Would you please verify and update your vote on dev@ accordingly? Currently you are on record as -1. Just trying to help Anirud to get proper vote count. Thanks Steffen (MXNet contributor hat on) On Tue, May 8, 2018 at 6:37 AM Marco de Abreu

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Marco de Abreu
Yes, sorry for the inconvenience! We fixed the root cause and everything should be back to normal. -Marco Steffen Rochel schrieb am Di., 8. Mai 2018, 14:59: > Marco - thanks for your efforts. Does this unblock the Apache MXNet v1.2 > release and change your vote? > >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-08 Thread Steffen Rochel
Marco - thanks for your efforts. Does this unblock the Apache MXNet v1.2 release and change your vote? On Tue, May 8, 2018 at 3:00 AM Marco de Abreu wrote: > Small update regarding the ARM64 builds. I have created two pull requests > [1][2] which changed the

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi Anirudh, Update: Did an install on a fresh instance with USE_MKLDNN=1, works fine now. Pip install with --pre is also working fine. Problem is the mkl-dnn I installed on the old instance. Closing the issue . Thanks! Best Regards Lai

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi Anirudh, yes, also tried that, didn't resolve. Looking into root cause and will update. Best Regards Lai Wei https://www.linkedin.com/pub/lai-wei/2b/731/52b On Mon, May 7, 2018 at 2:15 PM, Anirudh wrote: > Hi Lai, > > I see that you used

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Anirudh
Hi Lai, I see that you used USE_MKL2017_EXPERIMENTAL=1, I am not sure if this is the right flag. Did you try USE_MKLDNN=1 ? Anirudh On Mon, May 7, 2018 at 1:22 PM, Lai Wei wrote: > Hi, > > I would like to raise an issue with mxnet-mkl. The keras-mxnet package was >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Anirudh
Hi Marco, Thanks for raising this ! Can you please elaborate on where the arm cross compilation for Jetson is documented and what is the current user impact. Can we provide this workaround to use the dockerfile before the changes in the ARM cross compilation documentation. Did you happen to

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Lai Wei
Hi, I would like to raise an issue with mxnet-mkl. The keras-mxnet package was working fine with mxnet-mkl 1.1.0 for training on CPU. However, weights are not updated when I use mxnet-mkl 1.2.0b20180507. I tried both 'pip install mxnet-mkl --pre' and built from source from release branch (v1.2.0)

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-07 Thread Marco de Abreu
Sorry everybody, but it seems like our ARM64/Jetson build was just broken by the creators of our base crosscompile Dockerfile called 'dockcross'. This is one of our base images, used to cross-compile ARM64 (Jetson specifically). The owners merged the PR two days ago at [1] which led to our

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Steffen Rochel
+1 (non-binding). Tested with selected notebooks from The Straight Dope. So many important enhancements everybody contributed and our users are waiting for. Hope we will see more votes. Steffen On Mon, May 7, 2018 at 1:07 AM Anirudh wrote: > Hi all, > > Since we don't have

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Anirudh
Hi all, Since we don't have enough binding votes yet, I am extending the vote till tomorrow (Monday May 7th), 12:50 PM PDT. Anirudh On Sun, May 6, 2018 at 4:05 PM, Anirudh wrote: > Hi Pedro, > > Thanks for the clarification. I was able to reproduce the issue with >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-06 Thread Anirudh
Hi Pedro, Thanks for the clarification. I was able to reproduce the issue with USE_OPENMP=OFF. I wasn't able to reproduce the issue with Make. Since the issue is not reproducible with make and the customers using USE_OPENMP=OFF with cmake should be small, I agree with you that this should not be

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Anirudh
Hi Pedro, Thank you for raising this issue! I am not able to reproduce this on ubuntu 16.04 and cmake 3.5.1. Can you please provide the reproduction steps for the issue. Anirudh On Sat, May 5, 2018 at 3:12 AM, Pedro Larroy wrote: > Actually I have a linking

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Marco de Abreu
We had 4 out of 20 runs fail: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/v1.2.0/26 - already tracked at https://github.com/apache/incubator-mxnet/issues/10280 since 03/27

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Pedro Larroy
Actually I have a linking problem in my ubuntu desktop that is fixed in master: lc::ThreadedIter > >::Init(std::function, std::allocator >**)>,

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Pedro Larroy
Hi Looks like only gluon test lambda is failing intermittently, but looks like a minor numerical issue. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/ incubator-mxnet/detail/v1.2.0/20/pipeline I triggered a few builds yesterday and they all passed. I think Anirudh is right.

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-05 Thread Marco de Abreu
I can start a bunch of builds. I'll send a link when they are done. -Marco Jun Wu schrieb am Sa., 5. Mai 2018, 00:10: > +1 > I built from source and ran all the model quantization examples > successfully. > > On Fri, May 4, 2018 at 3:05 PM, Anirudh

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Jun Wu
+1 I built from source and ran all the model quantization examples successfully. On Fri, May 4, 2018 at 3:05 PM, Anirudh wrote: > Hi Pedro, Haibin, Indhu, > > Thank you for your inputs on the release. I ran the test: > `test_module.py:test_forward_reshape` for 250k times

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Anirudh
Hi Pedro, Haibin, Indhu, Thank you for your inputs on the release. I ran the test: `test_module.py:test_forward_reshape` for 250k times with different seeds. I was unable to reproduce the issue on the release branch. If everything goes well with CI tests by Pedro running till Sunday, I think we

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Indhu
+1 I've been using CUDA build from this branch (built from source) on Ubuntu for couple of days now and I haven't seen any issue. The flaky tests need to be fixed but this release need not be blocked for that. On Fri, May 4, 2018 at 11:32 AM, Haibin Lin wrote: > I

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Haibin Lin
I agree with Anirudh that the focus of the discussion should be limited to the release branch, not the master branch. Anything that breaks on master but works on release branch should not block the release itself. Best, Haibin On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
I see your point. I checked the failures on the v1.2.0 branch and I don't see segfaults, just minor failures due to flaky tests. I will trigger it repeatedly a few times until Sunday to have a and change my vote accordingly. http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Anirudh
Hi Pedro, Thank you for the suggestions. I will try to reproduce this without fixed seeds and also run it for a longer time duration. Having said that, running unit tests over and over for a couple of days will likely cause problems because there around 42 open issues for flaky tests:

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
Could you remove the fixed seeds and run it for a couple of hours with an additional loop? Also I would suggest running the unit tests over and over for a couple of days if possible. Pedro. On Thu, May 3, 2018 at 8:33 PM, Anirudh wrote: > Hi Pedro and Naveen, > > I am

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
Hi Anirudh I see too many random failures, segfaults and other problems. Qualitatively I don't think we are in a situation to make a release. For this I would expect to see master stable for most of the builds, and it's not the case right now. My vote is still -1 non binding. If someone is

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
+ 0 On Thu, May 3, 2018 at 12:44 PM, Anirudh wrote: > Hi Naveen, > > You raise a good point and I agree that by default MKLDNN by default should > be switched off. > Because of a bug in Cmakelists.txt which has been fixed as part of #10731, > which is merged to master

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi Naveen, You raise a good point and I agree that by default MKLDNN by default should be switched off. Because of a bug in Cmakelists.txt which has been fixed as part of #10731, which is merged to master (but not on the release branch), the users won't have MKLDNN enabled even though MKLDNN is

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Marco de Abreu
The MKLDNN tests are not really less stable than the other tests. It's pretty much the same across all tests we have. So I wouldn't say there's a need to fix them in a separate branch. On Thu, May 3, 2018 at 9:00 PM, Naveen Swamy wrote: > I also meant(but forgot to send), we

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
I also meant(but forgot to send), we stabilize it on a separate branch and then bring in the changes instead of blocking the PRs. On Thu, May 3, 2018 at 11:57 AM, Marco de Abreu < marco.g.ab...@googlemail.com> wrote: > I think the failing tests are really getting an issue. We now got roughly >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Marco de Abreu
I think the failing tests are really getting an issue. We now got roughly 50 test failure related issues [1], leading to a average failure rate of 50%. Considering the costs in terms of money and time per run, this is adding up quite significantly. Didn't we just remove MKLML from our codebase to

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
USE_MKLDNN is set to ON in the cmake file by default, since its experimental can we turn OFF so there is some determinism when users build and test. https://github.com/apache/incubator-mxnet/blob/60641ef1183bb4584c9356e84b6ca6d5fce58d6d/CMakeLists.txt#L23 On a separate note, since MKLDNN

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Correction: I was able to reproduce the issue with MKLDNN enabled on master, but not on 1.2 branch. On Thu, May 3, 2018 at 11:33 AM, Anirudh wrote: > Hi Pedro and Naveen, > > I am unable to reproduce this issue with MKLDNN on the master but not on > the 1.2.RC2 branch. >

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi Pedro and Naveen, I am unable to reproduce this issue with MKLDNN on the master but not on the 1.2.RC2 branch. Did the following on 1.2.RC2 branch: make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_DIST_KVSTORE=0 USE_CUDA=0 USE_CUDNN=0 USE_MKLDNN=1 export

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Anirudh
Hi Pedro and Naveen, Is this issue reproducible when MXNet is built with USE_MKLDNN=0? Also, there are a bunch of MKLDNN fixes that didn't go into the release branch. Is this issue reproducible on the release branch ? In my opinion, since we have marked MKLDNN as experimental feature for the

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Naveen Swamy
Thanks for raising this issue Pedro. -1(binding) We were in a similar state for a while a year ago, a lot of effort went to stabilize the tests and the CI. I have seen the PR builds are non-deterministic and you have to retry over and over (wasting resources and time) and hope you get lucky.

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-03 Thread Pedro Larroy
-1 nondeterminisitc failures on CI master: https://issues.apache.org/jira/browse/MXNET-396 Was able to reproduce once in a fresh p3 instance with DLAMI can't reproduce consistently. On Wed, May 2, 2018 at 9:51 PM, Anirudh wrote: > Hi all, > > As part of RC2 release, we