Re: Changes to MPI-operator

2019-04-15 Thread Yuan Tang
I am cc’ing MXNet dev mailing list here.

Thanks for the note Roshani. Look forward to seeing your contribution!
Though let’s also discuss this in MXNet dev mailing list since other people
(e.g. Carl and Lin) might be working on this as well to avoid duplicate
work.

Best,
Yuan

On Mon, Apr 15, 2019 at 5:51 PM Rong Ou  wrote:

> Sounds great! Yes it would be nice to have some examples for MXNet.
>
> On Mon, Apr 15, 2019 at 3:36 PM Roshani Nagmote 
> wrote:
>
>> Hi,
>>
>> I work on Apache MXNet and recently I used MPI-Operator to run
>> distributed training with MXNet and horovod on Kubernetes.
>> I with few other folks tried to adjust the capacity for a training job
>> based on the available workers and restart the training job from where it
>> left off if any worker goes away in between.
>>
>> To do this, we had to do a few modifications to MPI-operator. For
>> example, updating workerReplicas and launcherRole. Currently, changes are
>> in my repo and I will be making a PR on MPI-operator with these changes.
>> Also, planning to contribute few examples. I wanted to reach out to you
>> first before creating a PR.
>>
>> Please let me know what your thoughts are on this.
>>
>> Thanks,
>> Roshani
>>
>


[QUESTION] mxnet/Tuple vs nnvm/Tuple

2019-04-15 Thread Lin Yuan
Dear Community,

Currently in MXNet there are two Tuple template class defined in
mxnet/tuple.h and nnvm/tuple.h respectively. These two templates are higly
similar and most part are duplicated except for a couple of functions.
However, they were used mixedly in current codebase and causing conflict
sometimes.

Is there any historical reason that we keep two copies of the same template
class? If not, can we refactor the code to consolidate into one?

Thanks!

Lin