azai91 commented on issue #10994: MKLDNN fails in the backward computation when
forward runs with is_train=False
URL:
https://github.com/apache/incubator-mxnet/issues/10994#issuecomment-404974972
I think this might be an issue not specific to mkldnn.
built without mkldnn
```
ubuntu@ip-172-31-11-93:~/incubator-mxnet-original/build$ cmake
-DUSE_CUDNN=ON -DUSE_CUDA=ON -DBLAS=Open -GNinja -DCMAKE_BUILD_TYPE=Debug .. &&
ninja
```
and still get the issue
```
ubuntu@ip-172-31-11-93:~/incubator-mxnet-original$ nosetests
tests/python/unittest/test_gluon.py:check_hybrid_static_memory
/usr/local/lib/python3.5/dist-packages/nose/util.py:453: DeprecationWarning:
inspect.getargspec() is deprecated, use inspect.signature() instead
inspect.getargspec(func)
[INFO] Setting module np/mx/python random seeds, use
MXNET_MODULE_SEED=341183070 to reproduce.
[23:03:58] src/operator/nn/mkldnn/mkldnn_base.cc:73: Allocate 147456 bytes
with malloc directly
terminate called after throwing an instance of 'dmlc::Error'
what(): [23:03:58] src/engine/./threaded_engine.h:379: std::exception
A fatal error occurred in asynchronous engine operation. If you do not know
what caused this error, you can try set environment variable MXNET_ENGINE_TYPE
to NaiveEngine and run with debugger (i.e. gdb). This will force all operations
to be synchronous and backtrace will give you the series of calls that lead to
this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.
Stack trace returned 8 entries:
[bt] (0)
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x1bc)
[0x7f2eadfbfadc]
[bt] (1)
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
[0x7f2eadfc0e58]
[bt] (2)
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext,
mxnet::engine::OprBlock*)+0xfa9) [0x7f2eb0cb4619]
[bt] (3)
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(std::_Function_handler),
mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*,
bool)::{lambda()#1}::operator()()
const::{lambda(std::shared_ptr)#1}>::_M_invoke(std::_Any_data
const&, std::shared_ptr&&)+0xe2) [0x7f2eb0ccb102]
[bt] (4)
/home/ubuntu/incubator-mxnet-original/python/mxnet/../../lib/libmxnet.so(std::thread::_Impl)> (std::shared_ptr)>
>::_M_run()+0x4a) [0x7f2eb0cb355a]
[bt] (5) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7f2e87a43c80]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f2f122586ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f2f11f8e41d]
Aborted (core dumped)
```
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services