Harold-Zhang opened a new issue #7958: Engine shutdown
URL: https://github.com/apache/incubator-mxnet/issues/7958
 
 
   ## Environment info
   Operating System:
   Ubuntu 14.04
   Compiler:
   gcc 4.8.4
   Package used (Python/R/Scala/Julia):
   Python 2.7
   MXNet version:
   The latest version
   GPU:
   Tesla K40m
   
   
   ## Error Message:
   
   [19:43:24] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
   2017-09-19 19:43:45,834 - Epoch[0] Batch [20]        Speed: 4.56 samples/sec 
accuracy=0.964286
   2017-09-19 19:44:03,379 - Epoch[0] Batch [40]        Speed: 4.56 samples/sec 
accuracy=1.000000
   2017-09-19 19:44:20,942 - Epoch[0] Batch [60]        Speed: 4.56 samples/sec 
accuracy=1.000000
   2017-09-19 19:44:38,747 - Epoch[0] Batch [80]        Speed: 4.49 samples/sec 
accuracy=1.000000
   2017-09-19 19:44:56,319 - Epoch[0] Batch [100]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:45:13,862 - Epoch[0] Batch [120]       Speed: 4.56 samples/sec 
accuracy=1.000000
   2017-09-19 19:45:31,494 - Epoch[0] Batch [140]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:45:49,110 - Epoch[0] Batch [160]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:46:06,677 - Epoch[0] Batch [180]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:46:24,257 - Epoch[0] Batch [200]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:46:41,886 - Epoch[0] Batch [220]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:46:59,501 - Epoch[0] Batch [240]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:47:17,085 - Epoch[0] Batch [260]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:47:34,667 - Epoch[0] Batch [280]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:47:52,273 - Epoch[0] Batch [300]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:48:09,861 - Epoch[0] Batch [320]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:48:27,503 - Epoch[0] Batch [340]       Speed: 4.53 samples/sec 
accuracy=1.000000
   2017-09-19 19:48:45,085 - Epoch[0] Batch [360]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:49:02,700 - Epoch[0] Batch [380]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:49:20,358 - Epoch[0] Batch [400]       Speed: 4.53 samples/sec 
accuracy=1.000000
   2017-09-19 19:49:37,943 - Epoch[0] Batch [420]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:49:55,530 - Epoch[0] Batch [440]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:50:13,105 - Epoch[0] Batch [460]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:50:30,683 - Epoch[0] Batch [480]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:50:48,265 - Epoch[0] Batch [500]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:51:05,903 - Epoch[0] Batch [520]       Speed: 4.54 samples/sec 
accuracy=1.000000
   2017-09-19 19:51:23,492 - Epoch[0] Batch [540]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:51:41,176 - Epoch[0] Batch [560]       Speed: 4.52 samples/sec 
accuracy=1.000000
   2017-09-19 19:51:58,766 - Epoch[0] Batch [580]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:52:16,347 - Epoch[0] Batch [600]       Speed: 4.55 samples/sec 
accuracy=1.000000
   2017-09-19 19:52:33,933 - Epoch[0] Batch [620]       Speed: 4.55 samples/sec 
accuracy=1.000000
   [19:52:38] /home/harold/mxnet/dmlc-core/include/dmlc/logging.h:308: 
[19:52:38] src/io/image_io.cc:165: Check failed: !dst.empty() 
   
   Stack trace returned 10 entries:
   [bt] (0) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c)
 [0x7f9450196c8c]
   [bt] (1) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2io12ImdecodeImplEibPvmPNS_7NDArrayE+0x67a)
 [0x7f9451a3eefa]
   [bt] (2) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataS1_S3_+0x23)
 [0x7f9450284963]
   [bt] (3) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine11NaiveEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKc+0x8b)
 [0x7f94519daf4b]
   [bt] (4) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6Engine8PushSyncESt8functionIFvNS_10RunContextEEENS_7ContextERKSt6vectorIPNS_6engine3VarESaIS9_EESD_NS_10FnPropertyEiPKc+0x124)
 [0x7f9450285814]
   [bt] (5) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2io8ImdecodeERKN4nnvm9NodeAttrsERKSt6vectorINS_7NDArrayESaIS6_EEPS8_+0xc90)
 [0x7f9451a40a10]
   [bt] (6) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_Z20ImperativeInvokeImplRKN5mxnet7ContextEON4nnvm9NodeAttrsEPSt6vectorINS_7NDArrayESaIS7_EESA_PS6_IbSaIbEESD_+0x3cf)
 [0x7f94519a91ff]
   [bt] (7) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_Z22MXImperativeInvokeImplPviPS_PiPS0_iPPKcS5_+0x25b)
 [0x7f94519bb43b]
   [bt] (8) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvokeEx+0x2f)
 [0x7f94519a982f]
   [bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) 
[0x7f9468629adc]
   
   Traceback (most recent call last):
     File "train.py", line 142, in <module>
       image_shape='3,224,224', epoch=0, num_epoch=args.num_epoch, kv=kv)
     File "train.py", line 106, in train_model
       epoch_end_callback=checkpoint)
     File 
"/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/module/base_module.py",
 line 491, in fit
       next_data_batch = next(data_iter)
     File 
"/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/image/image.py",
 line 1151, in next
       data = self.imdecode(s)
     File 
"/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/image/image.py",
 line 1183, in imdecode
       return imdecode(s)
     File 
"/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/image/image.py",
 line 136, in imdecode
       return _internal._cvimdecode(buf, *args, **kwargs)
     File "<string>", line 16, in _cvimdecode
     File 
"/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/_ctypes/ndarray.py",
 line 92, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File 
"/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/base.py", 
line 143, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   mxnet.base.MXNetError: [19:52:38] src/io/image_io.cc:165: Check failed: 
!dst.empty() 
   
   Stack trace returned 10 entries:
   [bt] (0) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c)
 [0x7f9450196c8c]
   [bt] (1) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2io12ImdecodeImplEibPvmPNS_7NDArrayE+0x67a)
 [0x7f9451a3eefa]
   [bt] (2) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZNSt17_Function_handlerIFvN5mxnet10RunContextENS0_6engine18CallbackOnCompleteEEZNS0_6Engine8PushSyncESt8functionIFvS1_EENS0_7ContextERKSt6vectorIPNS2_3VarESaISC_EESG_NS0_10FnPropertyEiPKcEUlS1_S3_E_E9_M_invokeERKSt9_Any_dataS1_S3_+0x23)
 [0x7f9450284963]
   [bt] (3) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6engine11NaiveEngine9PushAsyncESt8functionIFvNS_10RunContextENS0_18CallbackOnCompleteEEENS_7ContextERKSt6vectorIPNS0_3VarESaISA_EESE_NS_10FnPropertyEiPKc+0x8b)
 [0x7f94519daf4b]
   [bt] (4) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet6Engine8PushSyncESt8functionIFvNS_10RunContextEEENS_7ContextERKSt6vectorIPNS_6engine3VarESaIS9_EESD_NS_10FnPropertyEiPKc+0x124)
 [0x7f9450285814]
   [bt] (5) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_ZN5mxnet2io8ImdecodeERKN4nnvm9NodeAttrsERKSt6vectorINS_7NDArrayESaIS6_EEPS8_+0xc90)
 [0x7f9451a40a10]
   [bt] (6) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_Z20ImperativeInvokeImplRKN5mxnet7ContextEON4nnvm9NodeAttrsEPSt6vectorINS_7NDArrayESaIS7_EESA_PS6_IbSaIbEESD_+0x3cf)
 [0x7f94519a91ff]
   [bt] (7) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(_Z22MXImperativeInvokeImplPviPS_PiPS0_iPPKcS5_+0x25b)
 [0x7f94519bb43b]
   [bt] (8) 
/usr/local/lib/python2.7/dist-packages/mxnet-0.11.1-py2.7.egg/mxnet/libmxnet.so(MXImperativeInvokeEx+0x2f)
 [0x7f94519a982f]
   [bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) 
[0x7f9468629adc]
   
   [19:52:38] src/engine/naive_engine.cc:53: Engine shutdown
   
   
   
   ## please provide the commands you have run that lead to the error.
   
   I used the pretrained model from https://github.com/cypw/DPNs
   
   commands:
   python train.py --epoch 0 --model ./models/dpn92-extra --batch-size 4 
--num-classes 2 --data-train ./lst_train.lst --image-train ./data/ --data-val 
./lst_val.lst --image-val ./data/ --num-examples 2000 --lr 0.001 --gpus 0 
--num-epoch 20 --save-result ./output
   
   I have tried --batch-size 16/32, and I got the same result.
   
   ## What have you tried to solve it?
   
   At first, I got a result: An fatal error occurred in asynchronous engine 
operation.
   According to a guide, I set environment MXNET_CUDNN_AUTOTUNE_DEFAULT=0 and 
MXNET_ENGINE_TYPE=NaiveEngine, then I got the above result.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to