[GitHub] azai91 commented on issue #10994: MKLDNN fails in the backward computation when forward runs with is_train=False

2018-08-30 Thread GitBox
azai91 commented on issue #10994: MKLDNN fails in the backward computation when 
forward runs with is_train=False
URL: 
https://github.com/apache/incubator-mxnet/issues/10994#issuecomment-417379102
 
 
   docs mention that this is only used for a few operators such as batch_norm 
(https://mxnet.incubator.apache.org/api/python/autograd/autograd.html#mxnet.autograd.record).
   
   I will make it fallback for MKLDNN. this is probably something that should 
be removed and can be inferred.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] azai91 commented on issue #10994: MKLDNN fails in the backward computation when forward runs with is_train=False

2018-07-13 Thread GitBox
azai91 commented on issue #10994: MKLDNN fails in the backward computation when 
forward runs with is_train=False
URL: 
https://github.com/apache/incubator-mxnet/issues/10994#issuecomment-404974972
 
 
   I think this might be an issue not specific to mkldnn.
   built without mkldnn 
   ```
   ubuntu@ip-172-31-11-93:~/incubator-mxnet-original/build$ cmake 
-DUSE_CUDNN=ON -DUSE_CUDA=ON -DBLAS=Open -GNinja -DCMAKE_BUILD_TYPE=Debug .. && 
ninja
   ```
   
   and still get the issue
   ```
   ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ nosetests 
tests/python/unittest/test_gluon.py:check_hybrid_static_memory
   /usr/local/lib/python3.5/dist-packages/nose/util.py:453: DeprecationWarning: 
inspect.getargspec() is deprecated, use inspect.signature() instead
 inspect.getargspec(func)
   [INFO] Setting module np/mx/python random seeds, use 
MXNET_MODULE_SEED=341183070 to reproduce.
   [23:03:58] src/operator/nn/mkldnn/mkldnn_base.cc:73: Allocate 147456 bytes 
with malloc directly
   terminate called after throwing an instance of 'dmlc::Error'
 what():  [23:03:58] src/engine/./threaded_engine.h:379: std::exception
   A fatal error occurred in asynchronous engine operation. If you do not know 
what caused this error, you can try set environment variable MXNET_ENGINE_TYPE 
to NaiveEngine and run with debugger (i.e. gdb). This will force all operations 
to be synchronous and backtrace will give you the series of calls that lead to 
this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
   
   Stack trace returned 8 entries:
   [bt] (0) 
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x1bc)
 [0x7f2eadfbfadc]
   [bt] (1) 
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
 [0x7f2eadfc0e58]
   [bt] (2) 
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
 mxnet::engine::OprBlock*)+0xfa9) [0x7f2eb0cb4619]
   [bt] (3) 
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(std::_Function_handler), 
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, 
bool)::{lambda()#1}::operator()() 
const::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data
 const&, std::shared_ptr&&)+0xe2) [0x7f2eb0ccb102]
   [bt] (4) 
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl)> (std::shared_ptr)> 
>::_M_run()+0x4a) [0x7f2eb0cb355a]
   [bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2e87a43c80]
   [bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2f122586ba]
   [bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2f11f8e41d]
   
   
   Aborted (core dumped)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] azai91 commented on issue #10994: MKLDNN fails in the backward computation when forward runs with is_train=False

2018-07-13 Thread GitBox
azai91 commented on issue #10994: MKLDNN fails in the backward computation when 
forward runs with is_train=False
URL: 
https://github.com/apache/incubator-mxnet/issues/10994#issuecomment-404970514
 
 
   @zheng-da do we know which operators cause this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services