[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-27 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r171157571
 
 

 ##
 File path: src/ndarray/ndarray.cc
 ##
 @@ -375,7 +375,45 @@ void NDArray::Chunk::Reorder2Default() {
   CheckAndAlloc(def_pd.get_size());
   // TODO(zhengda) We need to avoid memory copy here.
   memcpy(shandle.dptr, def_mem->get_data_handle(), def_pd.get_size());
-  mkl_mem_.reset(new mkldnn::memory(def_pd, shandle.dptr));
+  mkl_mem_ = nullptr;
+}
+
+void NDArray::Chunk::MKLDNNDataReorder(const mkldnn::memory::primitive_desc 
&pd) {
+  // If the memory already uses the specified layout, don't do anything.
+  if (mkl_mem_ != nullptr && mkl_mem_->get_primitive_desc() == pd)
+return;
+  auto _pd = pd;
+  auto _desc = _pd.desc();
+  auto def_format = GetDefaultFormat(_desc);
+  // If the memory is default, don't do anything.
+  if (def_format == _desc.data.format && IsDefault())
+return;
+  // If the specified layout is default, we should use Reorder2Default.
+  if (def_format == _desc.data.format) {
+Reorder2Default();
+return;
+  }
+
+  std::shared_ptr new_mem(new mkldnn::memory(pd));
+  std::shared_ptr old_mem;
+  if (IsDefault()) {
+auto def_pd = GetPrimitiveDesc(pd, def_format);
+old_mem.reset(new mkldnn::memory(def_pd, shandle.dptr));
+  } else {
+old_mem = this->mkl_mem_;
+  }
+  CHECK(old_mem->get_primitive_desc().desc().data.ndims == _desc.data.ndims);
+
+  // This may be called in MKLDNN operators. We can't use MKLDNNStream here.
+  std::vector net;
+  net.push_back(mkldnn::reorder(*old_mem, *new_mem));
+  mkldnn::stream(mkldnn::stream::kind::eager).submit(net).wait();
+
+  CHECK(shandle.size >= pd.get_size());
+  CheckAndAlloc(pd.get_size());
+  // TODO(zhengda) We need to avoid memory copy here.
 
 Review comment:
   This is from the previous PR. I just moved code here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-23 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r170326873
 
 

 ##
 File path: src/ndarray/ndarray.cc
 ##
 @@ -1017,6 +1017,7 @@ inline void CopyFromToDnsImpl(const NDArray& from, const 
NDArray& to, RunContext
 // with Copy().
 NDArray tmp_from = from;
 if (tmp_from.IsMKLDNNData()) {
+  // TODO(zhengda) tmp_from should be cached.
 
 Review comment:
   @larroy I agree with you that the current implementation of NDArray to 
handle storages is messy. What you said requires a rewrite of NDArray. 
@piiswrong and I talked about this, but I don't think this should be in this 
PR. We need to be very careful when rewriting NDArray, considering NDArray is 
one of the core components in MXNet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r170165614
 
 

 ##
 File path: src/ndarray/ndarray.cc
 ##
 @@ -1017,6 +1017,7 @@ inline void CopyFromToDnsImpl(const NDArray& from, const 
NDArray& to, RunContext
 // with Copy().
 NDArray tmp_from = from;
 if (tmp_from.IsMKLDNNData()) {
+  // TODO(zhengda) tmp_from should be cached.
 
 Review comment:
   const can have a lot of definitions. I don't think it's a bad design. here 
const means the values of the array are always the same, even though its layout 
may be changed. if we strictly follow the rule that const function doesn't 
modify data in an array, the code would become unmanagable. In fact, the data 
in a const array in MXNet is modified all the time. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r170130612
 
 

 ##
 File path: src/ndarray/ndarray.cc
 ##
 @@ -1017,6 +1017,7 @@ inline void CopyFromToDnsImpl(const NDArray& from, const 
NDArray& to, RunContext
 // with Copy().
 NDArray tmp_from = from;
 if (tmp_from.IsMKLDNNData()) {
+  // TODO(zhengda) tmp_from should be cached.
 
 Review comment:
   The reason I ask is that I saw TODO in many places in the code. Are all 
TODOs tracked in github issues? Or is this the new practice we want to push? 
And is github issue the best place for us to keep track of these TODOs? I 
remember Sheng has a script that closes issues if they aren't inactive for 30 
days or so.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r170129335
 
 

 ##
 File path: src/operator/nn/mkldnn/mkldnn_base.cc
 ##
 @@ -270,9 +270,24 @@ void FallBackCompute(FCompute fn, const nnvm::NodeAttrs 
&attrs,
  const std::vector &req,
  const std::vector &outputs) {
   std::vector in_blobs(inputs.size());
+  std::vector in_bufs;
   for (size_t i = 0; i < in_blobs.size(); i++) {
+// If the input data isn't stored in the default format, we shouldn't
+// call data() directly, which will change the layout of the NDArray.
+// Instead, we should save the converted data in another NDArray.
+// TODO(zhengda) we should use temp space to save the converted data.
+if (inputs[i].IsDefaultData()) {
   in_blobs[i] = inputs[i].data();
+} else {
+  in_bufs.emplace_back(inputs[i].shape(), inputs[i].ctx(),
 
 Review comment:
   I saw reserve() being used in many places in MXNet. I'm always wondering how 
much benefit it brings. Or is this a convention?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r170114168
 
 

 ##
 File path: tests/python/gpu/test_gluon_model_zoo_gpu.py
 ##
 @@ -37,11 +37,11 @@ def download_data():
 return mx.test_utils.download(
 'http://data.mxnet.io/data/val-5k-256.rec', VAL_DATA)
 
-@unittest.skip("test fails intermittently. temporarily disabled.")
 @with_seed()
 def test_inference():
 all_models = ['resnet50_v1', 'vgg19_bn', 'alexnet', #'inceptionv3',
   'densenet201', 'squeezenet1.0', 'mobilenet0.25']
+mx.random.seed(2)
 
 Review comment:
   Does with_seed() set the same seed for each run?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] zheng-da commented on a change in pull request #9862: Fix a race condition in converting data layouts in MKLDNN.

2018-02-22 Thread GitBox
zheng-da commented on a change in pull request #9862: Fix a race condition in 
converting data layouts in MKLDNN.
URL: https://github.com/apache/incubator-mxnet/pull/9862#discussion_r170114324
 
 

 ##
 File path: src/ndarray/ndarray.cc
 ##
 @@ -1017,6 +1017,7 @@ inline void CopyFromToDnsImpl(const NDArray& from, const 
NDArray& to, RunContext
 // with Copy().
 NDArray tmp_from = from;
 if (tmp_from.IsMKLDNNData()) {
+  // TODO(zhengda) tmp_from should be cached.
 
 Review comment:
   Create a github issue for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services