Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Jun Wu
+1 I built from source and ran all the model quantization examples successfully. On Fri, May 4, 2018 at 3:05 PM, Anirudh wrote: > Hi Pedro, Haibin, Indhu, > > Thank you for your inputs on the release. I ran the test: > `test_module.py:test_forward_reshape` for 250k times

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Anirudh
Hi Pedro, Haibin, Indhu, Thank you for your inputs on the release. I ran the test: `test_module.py:test_forward_reshape` for 250k times with different seeds. I was unable to reproduce the issue on the release branch. If everything goes well with CI tests by Pedro running till Sunday, I think we

Re: segmentation fault in master using mkdlnn

2018-05-04 Thread Da Zheng
I have come up a temporary solution for this memory error. https://github.com/apache/incubator-mxnet/pull/10812 I tested with Anirudh's command. It works fine. I call it a temporary solution because it only fixes the segfault. It seems to me that the race condition can potentially corrupt data in

Re: segmentation fault in master using mkdlnn

2018-05-04 Thread Zheng, Da
Hello Pedro, I did exactly what you said in your previous email. I edit ci/docker/runtime_functions.sh based on your patch and here is the history of running your commands: 2004 vim ci/docker/runtime_functions.sh 2005 ci/docker/runtime_functions.sh clean_repo 2006 ci/build.py -p

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Indhu
+1 I've been using CUDA build from this branch (built from source) on Ubuntu for couple of days now and I haven't seen any issue. The flaky tests need to be fixed but this release need not be blocked for that. On Fri, May 4, 2018 at 11:32 AM, Haibin Lin wrote: > I

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Haibin Lin
I agree with Anirudh that the focus of the discussion should be limited to the release branch, not the master branch. Anything that breaks on master but works on release branch should not block the release itself. Best, Haibin On Fri, May 4, 2018 at 10:58 AM, Pedro Larroy

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
I see your point. I checked the failures on the v1.2.0 branch and I don't see segfaults, just minor failures due to flaky tests. I will trigger it repeatedly a few times until Sunday to have a and change my vote accordingly. http://jenkins.mxnet-ci.amazon-ml.com/job/incubator-mxnet/job/v1.2.0/

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Anirudh
Hi Pedro, Thank you for the suggestions. I will try to reproduce this without fixed seeds and also run it for a longer time duration. Having said that, running unit tests over and over for a couple of days will likely cause problems because there around 42 open issues for flaky tests:

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
Could you remove the fixed seeds and run it for a couple of hours with an additional loop? Also I would suggest running the unit tests over and over for a couple of days if possible. Pedro. On Thu, May 3, 2018 at 8:33 PM, Anirudh wrote: > Hi Pedro and Naveen, > > I am

Re: [VOTE] Release Apache MXNet(incubating) version 1.2.0.RC2

2018-05-04 Thread Pedro Larroy
Hi Anirudh I see too many random failures, segfaults and other problems. Qualitatively I don't think we are in a situation to make a release. For this I would expect to see master stable for most of the builds, and it's not the case right now. My vote is still -1 non binding. If someone is

Re: segmentation fault in master using mkdlnn

2018-05-04 Thread Pedro Larroy
Hi Da. I run it both in my ubuntu 16.04 workstation, in a p3 instance with DLAMI. I'm pretty confident it runs in most linux environments. Can you post the exact commands that you run? is not clear to me what's the problem from your paste. Please make sure your repo is clean and all your subrepos

Master broken due to race condition of MKLDNN PR merges

2018-05-04 Thread Marco de Abreu
Hello, FYI, master is currently broken. This is caused by two conflicting PRs being merged at the same time [1][2]. The reason why this is possible is the following: A PR will be always be rebased on top of master if it gets a new commit. GitHub stores the result and shows the PR as successfully