[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-05 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r166079802
 
 

 ##
 File path: src/engine/threaded_engine.h
 ##
 @@ -338,33 +346,46 @@ class ThreadedEngine : public Engine {
 #endif
 CallbackOnComplete callback = this->CreateCallback(
 ThreadedEngine::OnCompleteStatic, opr_block);
+CallbackOnComplete on_start_callback = this->CreateCallback(
 
 Review comment:
   I think this is unnecessary overhead. Call OnStart directly if possible


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165731530
 
 

 ##
 File path: src/engine/threaded_engine.h
 ##
 @@ -476,6 +507,9 @@ class ThreadedEngine : public Engine {
*/
   std::mutex finished_m_;
   std::condition_variable finished_cv_;
+  /*! \brief exception_ptr associated with the engine,
+   * which is used to throw exception in waitall */
+  std::exception_ptr global_ex_ptr;
 
 Review comment:
   ex_ptr is a bad name. class members should end with `_`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165731112
 
 

 ##
 File path: src/storage/gpu_device_storage.h
 ##
 @@ -62,7 +62,7 @@ inline void* GPUDeviceStorage::Alloc(size_t size) {
 #endif  // MXNET_USE_NCCL
   cudaError_t e = cudaMalloc(, size);
   if (e != cudaSuccess && e != cudaErrorCudartUnloading)
-throw std::bad_alloc();
+LOG(FATAL) << cudaGetLastError();
 
 Review comment:
   Does cudaGetLastError return a string or an error code?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165731019
 
 

 ##
 File path: src/storage/cpu_device_storage.h
 ##
 @@ -61,10 +61,10 @@ inline void* CPUDeviceStorage::Alloc(size_t size) {
   void* ptr;
 #if _MSC_VER
   ptr = _aligned_malloc(size, alignment_);
-  if (ptr == NULL) throw std::bad_alloc();
+  if (ptr == NULL) LOG(FATAL) << "Malloc failure";
 
 Review comment:
   Failed to allocate CPU memory


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165730863
 
 

 ##
 File path: src/engine/threaded_engine.h
 ##
 @@ -338,33 +346,46 @@ class ThreadedEngine : public Engine {
 #endif
 CallbackOnComplete callback = this->CreateCallback(
 ThreadedEngine::OnCompleteStatic, opr_block);
+CallbackOnComplete on_start_callback = this->CreateCallback(
+ThreadedEngine::OnStartStatic, opr_block);
 bool debug_info = (engine_info_ && debug_push_opr_ == opr_block);
 if (debug_info) {
   LOG(INFO) << "ExecuteOprBlock " << opr_block
 << "shutdown_phase=" << shutdown_phase_;
 }
 if (!shutdown_phase_) {
   try {
+on_start_callback();
 if (debug_info) {
   LOG(INFO) << "ExecuteOprFn ";
 }
-threaded_opr->fn(run_ctx, callback);
+try {
+  if (!threaded_opr->ex_ptr || threaded_opr->wait) {
+threaded_opr->fn(run_ctx, callback);
+  } else {
+callback();
+  }
+} catch (dmlc::Error& e) {
+  threaded_opr->ex_ptr = std::current_exception();
+  callback();
+}
 if (debug_info) {
   LOG(INFO) << "Fin ExecuteOprFn ";
 }
-  } catch(dmlc::Error ) {
+  } catch (dmlc::Error& e) {
 
 Review comment:
   what would this catch now? There is already a try block inside


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165728531
 
 

 ##
 File path: src/engine/threaded_engine.h
 ##
 @@ -338,33 +346,46 @@ class ThreadedEngine : public Engine {
 #endif
 CallbackOnComplete callback = this->CreateCallback(
 ThreadedEngine::OnCompleteStatic, opr_block);
+CallbackOnComplete on_start_callback = this->CreateCallback(
+ThreadedEngine::OnStartStatic, opr_block);
 bool debug_info = (engine_info_ && debug_push_opr_ == opr_block);
 if (debug_info) {
   LOG(INFO) << "ExecuteOprBlock " << opr_block
 << "shutdown_phase=" << shutdown_phase_;
 }
 if (!shutdown_phase_) {
   try {
+on_start_callback();
 if (debug_info) {
   LOG(INFO) << "ExecuteOprFn ";
 }
-threaded_opr->fn(run_ctx, callback);
+try {
+  if (!threaded_opr->ex_ptr || threaded_opr->wait) {
+threaded_opr->fn(run_ctx, callback);
+  } else {
+callback();
+  }
+} catch (dmlc::Error& e) {
 
 Review comment:
   why only catch dmlc error?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165728359
 
 

 ##
 File path: src/engine/threaded_engine.h
 ##
 @@ -338,33 +346,46 @@ class ThreadedEngine : public Engine {
 #endif
 CallbackOnComplete callback = this->CreateCallback(
 ThreadedEngine::OnCompleteStatic, opr_block);
+CallbackOnComplete on_start_callback = this->CreateCallback(
 
 Review comment:
   what's the point of creating a call back here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165727816
 
 

 ##
 File path: src/engine/threaded_engine.h
 ##
 @@ -175,6 +175,8 @@ class ThreadedVar final
   static std::atomic counter;
   ~ThreadedVar() { LOG(INFO) << __func__ << " " << --counter; }
 #endif  // ENGINE_DEBUG
+  /*! \brief exception_ptr associated with the ThreadedVar */
+  std::exception_ptr ex_ptr;
 
 Review comment:
   ex_ptr is a bad variable name


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165727071
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -403,7 +417,11 @@ inline void ThreadedEngine::OnComplete(ThreadedOpr* 
threaded_opr) {
   }
   // Mark complete for write variables.
   for (auto&& i : threaded_opr->mutable_vars) {
-const bool debug_info = (engine_info_ && debug_wait_var_ == i);
+if (threaded_opr->ex_ptr) {
+  i->ex_ptr = threaded_opr->ex_ptr;
+  if (!global_ex_ptr) global_ex_ptr = i->ex_ptr;
+}
 
 Review comment:
   suppose and operator has three outputs x, y and z and it raises an exception.
   then x.asnumpy() would raise an error.
   Then y.asnumpy() would raise the same error again.
   
   and if I do z += 1 and it succeeds, z.asnumpy() would still raise the error


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165727071
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -403,7 +417,11 @@ inline void ThreadedEngine::OnComplete(ThreadedOpr* 
threaded_opr) {
   }
   // Mark complete for write variables.
   for (auto&& i : threaded_opr->mutable_vars) {
-const bool debug_info = (engine_info_ && debug_wait_var_ == i);
+if (threaded_opr->ex_ptr) {
+  i->ex_ptr = threaded_opr->ex_ptr;
+  if (!global_ex_ptr) global_ex_ptr = i->ex_ptr;
+}
 
 Review comment:
   suppose and operator has two outputs x and y and it raises an exception.
   then x.asnumpy() would raise an error.
   Then y.asnumpy() would raise the same error again.
   
   and if I do x += 1 and it succeeds, x.asnumpy() would still raise the error


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165727071
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -403,7 +417,11 @@ inline void ThreadedEngine::OnComplete(ThreadedOpr* 
threaded_opr) {
   }
   // Mark complete for write variables.
   for (auto&& i : threaded_opr->mutable_vars) {
-const bool debug_info = (engine_info_ && debug_wait_var_ == i);
+if (threaded_opr->ex_ptr) {
+  i->ex_ptr = threaded_opr->ex_ptr;
+  if (!global_ex_ptr) global_ex_ptr = i->ex_ptr;
+}
 
 Review comment:
   suppose and operator has two outputs x and y and it raises an exception.
   then x.asnumpy() would raise an error.
   Then y.asnumpy() would raise the same error again.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165726239
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -403,7 +417,11 @@ inline void ThreadedEngine::OnComplete(ThreadedOpr* 
threaded_opr) {
   }
   // Mark complete for write variables.
   for (auto&& i : threaded_opr->mutable_vars) {
-const bool debug_info = (engine_info_ && debug_wait_var_ == i);
+if (threaded_opr->ex_ptr) {
+  i->ex_ptr = threaded_opr->ex_ptr;
+  if (!global_ex_ptr) global_ex_ptr = i->ex_ptr;
+}
+bool debug_info = (engine_info_ && debug_wait_var_ == i);
 
 Review comment:
   why remove const?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165726063
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -391,6 +400,11 @@ void ThreadedEngine::WaitForAll() {
   finished_cv_.wait(lock, [this]() {
   return pending_.load() == 0 || kill_.load();
 });
+  if (global_ex_ptr) {
+std::exception_ptr ex_ptr = global_ex_ptr;
 
 Review comment:
   use std::rethrow_exception(std::move(global_ex_ptr))


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165726063
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -391,6 +400,11 @@ void ThreadedEngine::WaitForAll() {
   finished_cv_.wait(lock, [this]() {
   return pending_.load() == 0 || kill_.load();
 });
+  if (global_ex_ptr) {
+std::exception_ptr ex_ptr = global_ex_ptr;
 
 Review comment:
   use std::move


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165725747
 
 

 ##
 File path: src/engine/threaded_engine.cc
 ##
 @@ -356,7 +357,11 @@ void ThreadedEngine::DeleteVariable(SyncFn delete_fn,
 void ThreadedEngine::WaitForVar(VarHandle var) {
   BulkFlush();
   ThreadedVar* threaded_var = ThreadedVar::CastFromBase(var);
-  if (threaded_var->ready_to_read()) return;
+  if (threaded_var->ready_to_read()) {
+if (threaded_var->ex_ptr) {
+  std::rethrow_exception(threaded_var->ex_ptr);
+}
+  }
 
 Review comment:
   why are you not returning?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] piiswrong commented on a change in pull request #9681: Better Exception Handling for Operators

2018-02-02 Thread GitBox
piiswrong commented on a change in pull request #9681: Better Exception 
Handling for Operators
URL: https://github.com/apache/incubator-mxnet/pull/9681#discussion_r165725474
 
 

 ##
 File path: include/mxnet/engine.h
 ##
 @@ -182,7 +182,7 @@ class MXNET_API Engine {
  std::vector const& mutable_vars,
  FnProperty prop = FnProperty::kNormal,
  int priority = 0,
- const char* opr_name = nullptr) = 0;
+ const char* opr_name = nullptr, bool wait = false) = 
0;
 
 Review comment:
   what's wait? Why do you need it? Please document arguments


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services