Hi Tali,
Yes I think currently the foreach API is experimental and multi-device
support is future work. The existing implementation uses the main thread to
wait for execution result and does not handle the case for data parallel
training on multi gpus. However, if you use gluon, you can probably
Hello fellow MXNetters
We've seen that the subgraph execution mechanism that is used to run things
like the foreach operator causes MXExecutorForward to block, instead of just
issuing the ops in the normal asynchronous way