feiyuvl commented on issue #9611: program can't finished normally in dist_sync
mode
URL:
https://github.com/apache/incubator-mxnet/issues/9611#issuecomment-369529615
@XiaotaoChen sorry, mxnet has removed the epoch-size arg, which only exists
in old version
feiyuvl commented on issue #9611: program can't finished normally in dist_sync
mode
URL:
https://github.com/apache/incubator-mxnet/issues/9611#issuecomment-361849189
For distributed training, it's better to set the total iteration of each
node to the same number. The previous model