GoodJoey opened a new issue #9709: what will happen if one of the node reboot when doing the distribute training? URL: https://github.com/apache/incubator-mxnet/issues/9709 if i 'm doing the distribute training, supposed one of my node is rebooted(or 'dead' for some other reasons), what will happen? the training will fail? or it will find another node(if there is any) instead?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services