[ 
https://issues.apache.org/jira/browse/MXNET-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359184#comment-16359184
 ] 

Sheng Zha commented on MXNET-18:
--------------------------------

Recently szha@ attempted at using ndarray for accuracy metric which was 
previously based on numpy, and later on got reverted for performance regression 
due to this change for volta 8GPU resnet 50. Looking into the accuracy metric 
speed for this set-up, the following numbers are found:

for previous numpy version:

throughput 5753.84

Time cost=94.509

 

for ndarray per-batch blocking version:

throughput 4456.85

Time cost=120.803

 

for latest ndarray version:

throughput 4459.25

Time cost=120.602

 

To understand why, I did some simple profiling, and here’s what I found:

cumulative time consumed on metric now takes 42.96%, versus the numpy version 
which takes only 2.51%

majority of time is spent on __del__, followed by astype

The profiling results can be found at

numpy version: 
[http://vmprof.com/#/f0553a92-e576-410d-8784-71e185c8a39d?id=3,3,4&view=flames]

latest ndarray version: 
[http://vmprof.com/#/e35f2e5d-f8f2-4c7b-b955-8a58da5f8a88?id=3,2,4&view=flames]

 

It’s surprising that __del__ takes up so much time. Would you mind looking into 
why __del__ is taking up so much time and how we can overcome this problem?

 

To reproduce the speed numbers, I used the existing example in mxnet. The 
following command runs the script without dependency on data record (even 
though it might look so):

python example/image-classification/train_imagenet.py --gpu 0,1,2,3,4,5,6,7 
--batch-size 1024 --num-epochs 1 --data-train 
/data/imagenet/train-480-val-256-recordio/train.rec --data-train-idx 
/data/imagenet/train-480-val-256-recordio/train.idx --data-val 
/data/imagenet/train-480-val-256-recordio/val.rec --disp-batches 100 --network 
resnet-v1 --num-layers 50 --data-nthreads 40 --min-random-scale 0.533 
--max-random-shear-ratio 0 --max-random-rotate-angle 0 --max-random-h 0 
--max-random-l 0 --max-random-s 0 --dtype float16 --benchmark 1 --kv-store 
device

> Should investigate why __del__ takes a long time
> ------------------------------------------------
>
>                 Key: MXNET-18
>                 URL: https://issues.apache.org/jira/browse/MXNET-18
>             Project: Apache MXNet
>          Issue Type: Bug
>            Reporter: Sheng Zha
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

Reply via email to