sxjscience commented on issue #9171: MXNet: Using FusedRNNCell with its
"bidirectional" flag turned True, can lead to hanging of training run.
URL:
https://github.com/apache/incubator-mxnet/issues/9171#issuecomment-379421985
The error message:
```
Traceback (most recent call last):
File "sentiment_analysis.py", line 270, in
train(args)
File "sentiment_analysis.py", line 261, in train
test_avg_L, test_acc = evaluate(net, test_dataloader, context)
File "sentiment_analysis.py", line 136, in evaluate
total_L += L.sum().asscalar()
File "/home/ubuntu/mxnet/python/mxnet/ndarray/ndarray.py", line 1844, in
asscalar
return self.asnumpy()[0]
File "/home/ubuntu/mxnet/python/mxnet/ndarray/ndarray.py", line 1826, in
asnumpy
ctypes.c_size_t(data.size)))
File "/home/ubuntu/mxnet/python/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [22:08:57] src/operator/./cudnn_rnn-inl.h:457: Check
failed: e == CUDNN_STATUS_SUCCESS (8 vs. 0) cuDNN: CUDNN_STATUS_EXECUTION_FAILED
Stack trace returned 10 entries:
[bt] (0)
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x5b)
[0x7f4cee092c5b]
[bt] (1)
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28)
[0x7f4cee093798]
[bt] (2)
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::CuDNNRNNOp::Init(mshadow::Stream*,
std::vector > const&,
std::vector > const&)+0x2142)
[0x7f4cf27b4f22]
[bt] (3)
/home/ubuntu/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::op::CuDNNRNNOp::Forward(mxnet::OpContext
const&, std::vector > const&,
std::vector > const&,
std::vector > const&,
std::vector > const&)+0xa5d)
[0x7f4cf27c2f1d]
```
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
With regards,
Apache Git Services