[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-28 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-369194351 @cjolivier01 do you have more comments? @piiswrong do you want to review the code? The PR should have fixed

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-27 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-369142927 @marcoabreu Reorder2Default and MKLDNNDataReorder shouldn't be called frequently. They are not in the critical path.

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-27 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-369142927 @marcoabreu Reorder2Default and MKLDNNDataReorder shouldn't be called frequently. They are not in the critical path.

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-27 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368912733 @TaoLv I have updated the design doc to explain why we need data layout conversion. ---

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-27 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368856181 I think I'm done with changes for this PR. I run test_gluon_model_zoo_gpu.py for 1000 times and didn't see a race co

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-27 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368856181 I think I'm done with changes for this PR. I run test_gluon_model_zoo_gpu.py for 1000 times and didn't see a race co

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-23 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368098057 @larroy this is the design doc of mkldnn: https://cwiki.apache.org/confluence/display/MXNET/The+design+of+MKLDNN+int

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-23 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368098057 @larroy not yet. this is the design doc of mkldnn: https://cwiki.apache.org/confluence/display/MXNET/The+design+of+M

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-23 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368095680 @cjolivier01 why race condition happens more frequently when threads run in a smaller number of CPU cores? It seems

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-23 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-368095680 @cjolivier01 why race condition happens more frequently when threads run in a smaller number of CPU cores? It seems

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-367913772 it seems the current modification still can't get rid of all race conditions in the code. the reason is that we want

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-367876047 It's very difficult to reproduce a race condition in a deterministic way if it's possible. ---

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-367869410 The reason I disabled the inference tests because I previously thought the failure was related to numeric errors and

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-367862145 @marcoabreu enabling the tests can catch the error more easily.

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-367861997 @cjolivier01 The seed is set so we know what is the expected result. It's easier to tell whether CPU or GPU compute

[GitHub] zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on issue #9862: Fix a race condition in converting data layouts in MKLDNN. URL: https://github.com/apache/incubator-mxnet/pull/9862#issuecomment-367845835 @marcoabreu previously, I disabled the inference test. Now I enabled all tests. I also added some prints to clearly