[ https://issues.apache.org/jira/browse/MXNET-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359184#comment-16359184 ]
Sheng Zha commented on MXNET-18: -------------------------------- Recently szha@ attempted at using ndarray for accuracy metric which was previously based on numpy, and later on got reverted for performance regression due to this change for volta 8GPU resnet 50. Looking into the accuracy metric speed for this set-up, the following numbers are found: for previous numpy version: throughput 5753.84 Time cost=94.509 for ndarray per-batch blocking version: throughput 4456.85 Time cost=120.803 for latest ndarray version: throughput 4459.25 Time cost=120.602 To understand why, I did some simple profiling, and here’s what I found: cumulative time consumed on metric now takes 42.96%, versus the numpy version which takes only 2.51% majority of time is spent on __del__, followed by astype The profiling results can be found at numpy version: [http://vmprof.com/#/f0553a92-e576-410d-8784-71e185c8a39d?id=3,3,4&view=flames] latest ndarray version: [http://vmprof.com/#/e35f2e5d-f8f2-4c7b-b955-8a58da5f8a88?id=3,2,4&view=flames] It’s surprising that __del__ takes up so much time. Would you mind looking into why __del__ is taking up so much time and how we can overcome this problem? To reproduce the speed numbers, I used the existing example in mxnet. The following command runs the script without dependency on data record (even though it might look so): python example/image-classification/train_imagenet.py --gpu 0,1,2,3,4,5,6,7 --batch-size 1024 --num-epochs 1 --data-train /data/imagenet/train-480-val-256-recordio/train.rec --data-train-idx /data/imagenet/train-480-val-256-recordio/train.idx --data-val /data/imagenet/train-480-val-256-recordio/val.rec --disp-batches 100 --network resnet-v1 --num-layers 50 --data-nthreads 40 --min-random-scale 0.533 --max-random-shear-ratio 0 --max-random-rotate-angle 0 --max-random-h 0 --max-random-l 0 --max-random-s 0 --dtype float16 --benchmark 1 --kv-store device > Should investigate why __del__ takes a long time > ------------------------------------------------ > > Key: MXNET-18 > URL: https://issues.apache.org/jira/browse/MXNET-18 > Project: Apache MXNet > Issue Type: Bug > Reporter: Sheng Zha > Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org For additional commands, e-mail: issues-h...@mxnet.apache.org