[GitHub] leoxiaobin commented on issue #7455: Distributed training is slow

2017-08-15 Thread git
leoxiaobin commented on issue #7455: Distributed training is slow URL: https://github.com/apache/incubator-mxnet/issues/7455#issuecomment-322390365 @starimpact , I have tried to use 4 servers per machine, I got almost the same result.

[GitHub] leoxiaobin commented on issue #7455: Distributed training is slow

2017-08-15 Thread git
leoxiaobin commented on issue #7455: Distributed training is slow URL: https://github.com/apache/incubator-mxnet/issues/7455#issuecomment-322390219 @szha , every server has 8 TitanXp GPUs and 2 Intel Xeon CPU E5-2650 v2@ 2.60GHz. The two servers are connected with IB cards. The

[GitHub] leoxiaobin commented on issue #7455: Distributed training is slow

2017-08-14 Thread git
leoxiaobin commented on issue #7455: Distributed training is slow URL: https://github.com/apache/incubator-mxnet/issues/7455#issuecomment-322365618 @szha , I have tried dist_sync_device, and I got almost the same result. For dist_async, it using the async SGD, i don't think it can be