leoxiaobin commented on issue #7455: Distributed training is slow
URL:
https://github.com/apache/incubator-mxnet/issues/7455#issuecomment-322390365
@starimpact , I have tried to use 4 servers per machine, I got almost the
same result.
leoxiaobin commented on issue #7455: Distributed training is slow
URL:
https://github.com/apache/incubator-mxnet/issues/7455#issuecomment-322390219
@szha , every server has 8 TitanXp GPUs and 2 Intel Xeon CPU E5-2650 v2@
2.60GHz.
The two servers are connected with IB cards.
The
leoxiaobin commented on issue #7455: Distributed training is slow
URL:
https://github.com/apache/incubator-mxnet/issues/7455#issuecomment-322365618
@szha , I have tried dist_sync_device, and I got almost the same result.
For dist_async, it using the async SGD, i don't think it can be