idealboy commented on issue #7412: About van when using distribute training
URL: 
https://github.com/apache/incubator-mxnet/issues/7412#issuecomment-321456391
 
 
   my environment setting about mxnet dist-sync on two machines are below, two 
machines are ssh-able:
   
   (1)10.15.240.189:
   
   export DMLC_NUM_WORKER=1
   export MXNET_GPU_WORKER_NTHREADS=8
   export MXNET_CPU_WORKER_NTHREADS=8
   export MXNET_CPU_PRIORITY_NTHREADS=8
   export MXNET_GPU_COPY_NTHREADS=2
   export DMLC_NUM_SERVER=1
   export DMLC_PS_ROOT_URI=10.15.240.189
   export DMLC_PS_ROOT_PORT=3000
   export DMLC_ROLE=scheduler
   export DMLC_INTERFACE="eth0"
   
   (2)10.155.133.82
   export DMLC_ROLE=worker
   export DMLC_WORKER_NUM=1
   export DMLC_SERVER_NUM=1
   export DMLC_PS_ROOT_URI=10.15.240.189
   export DMLC_PS_ROOT_PORT=8000
   export DMLC_ROLE=worker
   export DMLC_INTERFACE="eth0"
   
   
   my program is:
   python ../../tools/launch.py -n 2 -H hosts --sync-dst-dir /tmp/mxnet    
python train_mnist.py --network lenet --gpus 0 --kv-store dist_sync
   
   
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to