Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-12-07 Thread Haibin Lin
I do expect the API to change in the future. Currently @szhengac @zhongyuchen and I are exploring APIs for gradient compression with a few algorithms, and we may bring back the best practice back to MXNet. -- You are receiving this because you are subscribed to this thread. Reply to this

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Leonard Lausen
Would it make sense to add optional support for sparse ndarrays and gradient compression in `AbstractKVStore`? You mentioned not all frameworks support it. Do you expect the API to change in the future? -- You are receiving this because you are subscribed to this thread. Reply to this email

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Haibin Lin
I did mean use case 2,3,4. Initialization is done in the constructor `kv.__init__()`, and for horovod it could be simply a `hvd.init()` call. I have not discussed problem 1 for too much details. horovod uses mpirun to setup connection and launch processes, while byteps/p3 and native kvstore

Re: [apache/incubator-mxnet] [RFC] Unified API for Distributed Data Parallel Training (#16795)

2019-11-12 Thread Lin Yuan
In the Limitation, I suppose you meant 'use case 1,3,4', right? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/16795#issuecomment-553085374