[GitHub] [incubator-singa] chrishkchris commented on issue #535: SINGA-490 Optimize performance of stochastic gradient descent (SGD)

2019-09-18 Thread GitBox
chrishkchris commented on issue #535: SINGA-490 Optimize performance of 
stochastic gradient descent (SGD)
URL: https://github.com/apache/incubator-singa/pull/535#issuecomment-532944885
 
 
   Finally, I test the distributed training in AWS p2.x8large, after adding the 
Sync() in the SGD loop of resnet.py and resnet_dist.py.
   The speed up of using 8 GPUs is now 7.21x, but this is compared without real 
data feeding. 
   See the following throughput comparison in resnet.py and resnet_dist.py:
   
   ```
   ubuntu@ip-172-31-28-231:~/incubator-singa/examples/autograd$ python3 
resnet.py
   Start intialization
   
100%|█|
 100/100 [01:23<00:00,  1.19it/s]
   Throughput = 38.13589358185999 per second
   Total=0.8391045022010803, forward=0.26401839971542357, 
softmax=0.0020227289199829103, backward=0.5730633735656739, 
sgd=0.016838366985321044
   
   ubuntu@ip-172-31-28-231:~/incubator-singa/examples/autograd$ 
/home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 
resnet_dist.py
   Start intialization...
   100%|██| 100/100 [01:33<00:00,  1.08it/s]
   Throughput = 274.9947180123401 per second
   Total=0.9309269714355469, forward=0.2690380573272705, 
softmax=0.0021610450744628906, backward=0.6597278690338135, 
sgd=0.10374969005584717
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-singa] chrishkchris commented on issue #535: SINGA-490 Optimize performance of stochastic gradient descent (SGD)

2019-09-18 Thread GitBox
chrishkchris commented on issue #535: SINGA-490 Optimize performance of 
stochastic gradient descent (SGD)
URL: https://github.com/apache/incubator-singa/pull/535#issuecomment-532723250
 
 
   Next, I further improve the time performance by using in-place elementwise 
multiply in many functions such as ReLU. Here is the result:
   
   ```
   ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 
mnist_cnn.py
   Starting Epoch 0:
   Training loss = 585.42, training accuracy = 0.793390
   Evaluation accuracy = 0.939303, Elapsed Time = 4.206943s
   Starting Epoch 1:
   Training loss = 234.893921, training accuracy = 0.922409
   Evaluation accuracy = 0.955729, Elapsed Time = 4.101450s
   Starting Epoch 2:
   Training loss = 169.515244, training accuracy = 0.943286
   Evaluation accuracy = 0.970252, Elapsed Time = 4.104907s
   Starting Epoch 3:
   Training loss = 136.331894, training accuracy = 0.954442
   Evaluation accuracy = 0.968450, Elapsed Time = 4.115959s
   Starting Epoch 4:
   Training loss = 118.268318, training accuracy = 0.960512
   Evaluation accuracy = 0.971755, Elapsed Time = 4.117009s
   Starting Epoch 5:
   Training loss = 104.006439, training accuracy = 0.965732
   Evaluation accuracy = 0.978866, Elapsed Time = 4.117350s
   Starting Epoch 6:
   Training loss = 93.860809, training accuracy = 0.969067
   Evaluation accuracy = 0.977464, Elapsed Time = 4.106471s
   Starting Epoch 7:
   Training loss = 88.009178, training accuracy = 0.970251
   Evaluation accuracy = 0.982873, Elapsed Time = 4.116037s
   Starting Epoch 8:
   Training loss = 81.978348, training accuracy = 0.972802
   Evaluation accuracy = 0.983974, Elapsed Time = 4.121274s
   Starting Epoch 9:
   Training loss = 75.998878, training accuracy = 0.974103
   Evaluation accuracy = 0.982272, Elapsed Time = 4.122591s
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services