[GitHub] feiyuvl commented on issue #9611: program can't finished normally in dist_sync mode

2018-03-01 Thread GitBox
feiyuvl commented on issue #9611: program can't finished normally in dist_sync mode URL: https://github.com/apache/incubator-mxnet/issues/9611#issuecomment-369529615 @XiaotaoChen sorry, mxnet has removed the epoch-size arg, which only exists in old version

[GitHub] feiyuvl commented on issue #9611: program can't finished normally in dist_sync mode

2018-01-30 Thread GitBox
feiyuvl commented on issue #9611: program can't finished normally in dist_sync mode URL: https://github.com/apache/incubator-mxnet/issues/9611#issuecomment-361849189 For distributed training, it's better to set the total iteration of each node to the same number. The previous model