GoodJoey opened a new issue #10017: When doing the distributed training, the 
bandwidth becomes the bottleneck.
URL: https://github.com/apache/incubator-mxnet/issues/10017
 
 
   is there any experience/magic solutions  when using mxnet to do distribute 
training?
   say, is there any method, force the worker communicates with the parameter 
server on the same node to save the bandwidth? or something like 
ring-allreduce(horovod)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to