from:"Taliesin Beynon"

trouble with foreach operator in conjunction with multiple GPUs

2018-11-28 Thread Taliesin Beynon

Hello fellow MXNetters

We've seen that the subgraph execution mechanism that is used to run things 
like the foreach operator causes MXExecutorForward to block, instead of just 
issuing the ops in the normal asynchronous way 
(https://github.com/apache/incubator-mxnet/blob/212364b0cba28aeda989378f6e630f7a61749bf3/src/executor/graph_executor.cc#L1352).
 On its own this is a surprising fact that can lead to some issues if you're 
not expecting it, like your time being spent in MXExecutorForward instead of 
WaitAll / WaitRead . Is there a reason that this process isn't just 
automatically done on a separate thread for you? Is it to ensure that 
subsequent ops on the original thread are correctly serialized wrt the ops 
produced by the foreach? 

More importantly, this has the unfortunate implication that if you are using 
multi-device parallelism with foreach, by just looping over your executors and 
calling Forward on them, you will inadvertently serialize much of the 
computation: you can't call Forward on the second executor until Forward on the 
first executor has returned, and the foreach causes that first Forward call to 
block until the forward pass is (mostly) done!

So it kills multi-device parallelism unless one starts making thread pools so 
that the one can 'unblock' Forward (and probably the subsequent Backward) and 
have each device's Forward being run in a separate thread. 

Is this intended? Are we missing something about how you are supposed to use 
subgraphs in conjunction with multi-device parallelism? It seems like a 
weakness in the current design of subgraph execution. It also appears that the 
python API doesn't have any strategy to deal with this issue, as you can see on 
https://github.com/apache/incubator-mxnet/blob/2276bb0e30b1fe601eb288cb4f1b673484892d4b/python/mxnet/executor_manager.py#L281,
 it's not making separate threads or anything there.

Thanks!
Tali + Sebastian

Request to join MXNet slack channel

2018-08-01 Thread Taliesin Beynon

Hello,

I’d like to join the slack channel.

Sorry if this is the wrong list to mail, but saw some others requests come this 
way!

Thanks,
Tali

Proposal for new merging/reshaping op

2018-07-30 Thread Taliesin Beynon

Hello all,

I’ve created a proposal for a new reshape op, it is described further at 
https://discuss.mxnet.io/t/proposal-new-merge-dims-reshaping-op/1524

Please post feedback there!
Thanks,
Tali

trouble with foreach operator in conjunction with multiple GPUs

Request to join MXNet slack channel

Proposal for new merging/reshaping op

3 matches

Site Navigation

Mail list logo

Footer information