Re: Horovod-MXNet Integration

2019-01-30 Thread Aaron Markham
d put all workloads into pushpull. Broadcast can be implemented > > by > > > > pull. > > > > > > > > What's local workers? GPUs in the single machine? If so, we can query > > it > > > > directly. > > > > > > > >

Re: Horovod-MXNet Integration

2019-01-30 Thread Lin Yuan
gt; > directly. > > > > > > > > > On Fri, Sep 14, 2018 at 4:46 PM Carl Yang wrote: > > > > > > > Hi, > > > > > > > > Currently, MXNet distributed can only be done using parameter server. > > > > Horovod

Re: Horovod-MXNet Integration

2019-01-30 Thread Yuan Tang
PM Carl Yang wrote: > > > > > Hi, > > > > > > Currently, MXNet distributed can only be done using parameter server. > > > Horovod is an open-source distributed training framework that has > > > shown 2x speedup compared to TensorFlow using Parameter

Re: Horovod-MXNet Integration

2018-11-02 Thread Lin Yuan
vod is an open-source distributed training framework that has > > shown 2x speedup compared to TensorFlow using Parameter Server. We > > propose to add Horovod support to MXNet. This will help our users > > achieve goal of linear scalability to 256 GPUs and beyond. Design > > prop

Re: Horovod-MXNet Integration

2018-10-31 Thread Mu Li
://cwiki.apache.org/confluence/display/MXNET/Horovod-MXNet+Integration > > Please feel free to let me know if you have any suggestions or feedback. > > Regards, > Carl >

Horovod-MXNet Integration

2018-09-14 Thread Carl Yang
scalability to 256 GPUs and beyond. Design proposal on cwiki: https://cwiki.apache.org/confluence/display/MXNET/Horovod-MXNet+Integration Please feel free to let me know if you have any suggestions or feedback. Regards, Carl