caiqi opened a new issue #8139: mxnet ssd training speed slow down after some batches URL: https://github.com/apache/incubator-mxnet/issues/8139 ## Environment info Operating System: Windows Package used (Python/R/Scala/Julia): Python MXNet version: 0.11.0 Or if installed from source: install with pip When train ssd detection model using multi GPUs, the training speed is slowing down after a long time. Is there any way to solve the problem? Thanks. > INFO:root:Epoch[0] Batch [20] Speed: 74.29 samples/sec > INFO:root:Epoch[0] Batch [40] Speed: 76.35 samples/sec > INFO:root:Epoch[0] Batch [60] Speed: 75.31 samples/sec > INFO:root:Epoch[0] Batch [80] Speed: 74.59 samples/sec > INFO:root:Epoch[0] Batch [100] Speed: 75.76 samples/sec > INFO:root:Epoch[0] Batch [120] Speed: 77.23 samples/sec > INFO:root:Epoch[0] Batch [140] Speed: 74.44 samples/sec > INFO:root:Epoch[0] Batch [160] Speed: 74.67 samples/sec > INFO:root:Epoch[0] Batch [180] Speed: 75.40 samples/sec > INFO:root:Epoch[0] Batch [200] Speed: 76.74 samples/sec > INFO:root:Epoch[0] Batch [220] Speed: 75.12 samples/sec > INFO:root:Epoch[0] Batch [240] Speed: 76.70 samples/sec > INFO:root:Epoch[0] Batch [260] Speed: 74.27 samples/sec > INFO:root:Epoch[0] Batch [280] Speed: 75.89 samples/sec > INFO:root:Epoch[0] Batch [300] Speed: 75.57 samples/sec > INFO:root:Epoch[0] Batch [320] Speed: 76.34 samples/sec > INFO:root:Epoch[0] Batch [340] Speed: 75.85 samples/sec > INFO:root:Epoch[0] Batch [360] Speed: 76.27 samples/sec > INFO:root:Epoch[0] Batch [380] Speed: 76.11 samples/sec > INFO:root:Epoch[0] Batch [400] Speed: 76.88 samples/sec > INFO:root:Epoch[0] Batch [420] Speed: 75.87 samples/sec > INFO:root:Epoch[0] Batch [440] Speed: 75.08 samples/sec > INFO:root:Epoch[0] Batch [460] Speed: 76.34 samples/sec > INFO:root:Epoch[0] Batch [480] Speed: 76.06 samples/sec > INFO:root:Epoch[0] Batch [500] Speed: 73.84 samples/sec > INFO:root:Epoch[0] Batch [520] Speed: 69.82 samples/sec > INFO:root:Epoch[0] Batch [540] Speed: 65.33 samples/sec > INFO:root:Epoch[0] Batch [560] Speed: 63.28 samples/sec > INFO:root:Epoch[0] Batch [580] Speed: 59.28 samples/sec > INFO:root:Epoch[0] Batch [600] Speed: 54.57 samples/sec > INFO:root:Epoch[0] Batch [620] Speed: 52.37 samples/sec > INFO:root:Epoch[0] Batch [640] Speed: 51.08 samples/sec > INFO:root:Epoch[0] Batch [660] Speed: 50.30 samples/sec > INFO:root:Epoch[0] Batch [680] Speed: 49.22 samples/sec > INFO:root:Epoch[0] Batch [700] Speed: 49.70 samples/sec > INFO:root:Epoch[0] Batch [720] Speed: 50.45 samples/sec > INFO:root:Epoch[0] Batch [740] Speed: 52.21 samples/sec > INFO:root:Epoch[0] Batch [760] Speed: 54.90 samples/sec > INFO:root:Epoch[0] Batch [780] Speed: 58.65 samples/sec > INFO:root:Epoch[0] Batch [800] Speed: 60.69 samples/sec > INFO:root:Epoch[0] Batch [820] Speed: 66.90 samples/sec > INFO:root:Epoch[0] Batch [840] Speed: 68.57 samples/sec > INFO:root:Epoch[0] Batch [860] Speed: 70.10 samples/sec > INFO:root:Epoch[0] Batch [880] Speed: 70.06 samples/sec > INFO:root:Epoch[0] Batch [900] Speed: 71.81 samples/sec > INFO:root:Epoch[0] Batch [920] Speed: 73.46 samples/sec > INFO:root:Epoch[0] Batch [940] Speed: 72.55 samples/sec > INFO:root:Epoch[0] Batch [960] Speed: 71.95 samples/sec > INFO:root:Epoch[0] Batch [980] Speed: 72.64 samples/sec > INFO:root:Epoch[0] Batch [1000] Speed: 72.28 samples/sec > INFO:root:Epoch[0] Batch [1020] Speed: 72.63 samples/sec > INFO:root:Epoch[0] Batch [1040] Speed: 73.61 samples/sec > INFO:root:Epoch[0] Batch [1060] Speed: 74.30 samples/sec > INFO:root:Epoch[0] Batch [1080] Speed: 73.47 samples/sec > INFO:root:Epoch[0] Batch [1100] Speed: 73.09 samples/sec > INFO:root:Epoch[0] Batch [1120] Speed: 72.78 samples/sec > INFO:root:Epoch[0] Batch [1140] Speed: 73.37 samples/sec > INFO:root:Epoch[0] Batch [1160] Speed: 73.40 samples/sec > INFO:root:Epoch[0] Batch [1180] Speed: 73.53 samples/sec > INFO:root:Epoch[0] Batch [1200] Speed: 73.48 samples/sec > INFO:root:Epoch[0] Batch [1220] Speed: 71.79 samples/sec > INFO:root:Epoch[0] Batch [1240] Speed: 72.09 samples/sec > INFO:root:Epoch[0] Batch [1260] Speed: 69.93 samples/sec > INFO:root:Epoch[0] Batch [1280] Speed: 64.27 samples/sec > INFO:root:Epoch[0] Batch [1300] Speed: 59.91 samples/sec > INFO:root:Epoch[0] Batch [1320] Speed: 54.84 samples/sec > INFO:root:Epoch[0] Batch [1340] Speed: 51.23 samples/sec > INFO:root:Epoch[0] Batch [1360] Speed: 48.56 samples/sec > INFO:root:Epoch[0] Batch [1380] Speed: 44.69 samples/sec > INFO:root:Epoch[0] Batch [1400] Speed: 41.22 samples/sec > INFO:root:Epoch[0] Batch [1420] Speed: 38.33 samples/sec > INFO:root:Epoch[0] Batch [1440] Speed: 35.85 samples/sec > INFO:root:Epoch[0] Batch [1460] Speed: 33.92 samples/sec > INFO:root:Epoch[0] Batch [1480] Speed: 32.97 samples/sec > INFO:root:Epoch[0] Batch [1500] Speed: 32.97 samples/sec > INFO:root:Epoch[0] Batch [1520] Speed: 31.58 samples/sec > INFO:root:Epoch[0] Batch [1540] Speed: 31.68 samples/sec > INFO:root:Epoch[0] Batch [1560] Speed: 31.93 samples/sec > INFO:root:Epoch[0] Batch [1580] Speed: 32.16 samples/sec > INFO:root:Epoch[0] Batch [1600] Speed: 32.04 samples/sec > INFO:root:Epoch[0] Batch [1620] Speed: 31.45 samples/sec > INFO:root:Epoch[0] Batch [1640] Speed: 31.57 samples/sec > INFO:root:Epoch[0] Batch [1660] Speed: 32.46 samples/sec > INFO:root:Epoch[0] Batch [1680] Speed: 32.22 samples/sec > INFO:root:Epoch[0] Batch [1700] Speed: 32.88 samples/sec > INFO:root:Epoch[0] Batch [1720] Speed: 33.13 samples/sec > INFO:root:Epoch[0] Batch [1740] Speed: 33.05 samples/sec > INFO:root:Epoch[0] Batch [1760] Speed: 32.21 samples/sec > INFO:root:Epoch[0] Batch [1780] Speed: 33.01 samples/sec > INFO:root:Epoch[0] Batch [1800] Speed: 33.49 samples/sec > INFO:root:Epoch[0] Batch [1820] Speed: 33.01 samples/sec > INFO:root:Epoch[0] Batch [1840] Speed: 33.17 samples/sec > INFO:root:Epoch[0] Batch [1860] Speed: 33.39 samples/sec > INFO:root:Epoch[0] Batch [1880] Speed: 33.39 samples/sec > INFO:root:Epoch[0] Batch [1900] Speed: 33.29 samples/sec > INFO:root:Epoch[0] Batch [1920] Speed: 33.13 samples/sec > INFO:root:Epoch[0] Batch [1940] Speed: 33.12 samples/sec > INFO:root:Epoch[0] Batch [1960] Speed: 32.88 samples/sec > INFO:root:Epoch[0] Batch [1980] Speed: 32.72 samples/sec > INFO:root:Epoch[0] Batch [2000] Speed: 32.14 samples/sec > INFO:root:Epoch[0] Batch [2020] Speed: 32.45 samples/sec > INFO:root:Epoch[0] Batch [2040] Speed: 32.40 samples/sec > INFO:root:Epoch[0] Batch [2060] Speed: 32.16 samples/sec > INFO:root:Epoch[0] Batch [2080] Speed: 32.09 samples/sec > INFO:root:Epoch[0] Batch [2100] Speed: 31.88 samples/sec > INFO:root:Epoch[0] Batch [2120] Speed: 30.83 samples/sec > INFO:root:Epoch[0] Batch [2140] Speed: 30.39 samples/sec > INFO:root:Epoch[0] Batch [2160] Speed: 31.89 samples/sec > INFO:root:Epoch[0] Batch [2180] Speed: 31.49 samples/sec > INFO:root:Epoch[0] Batch [2200] Speed: 31.35 samples/sec > INFO:root:Epoch[0] Batch [2220] Speed: 31.48 samples/sec > INFO:root:Epoch[0] Batch [2240] Speed: 31.22 samples/sec > INFO:root:Epoch[0] Batch [2260] Speed: 31.34 samples/sec > INFO:root:Epoch[0] Batch [2280] Speed: 31.03 samples/sec > INFO:root:Epoch[0] Batch [2300] Speed: 30.95 samples/sec > INFO:root:Epoch[0] Batch [2320] Speed: 30.93 samples/sec > INFO:root:Epoch[0] Batch [2340] Speed: 31.31 samples/sec > INFO:root:Epoch[0] Batch [2360] Speed: 31.19 samples/sec > INFO:root:Epoch[0] Batch [2380] Speed: 30.80 samples/sec > INFO:root:Epoch[0] Batch [2400] Speed: 31.07 samples/sec > INFO:root:Epoch[0] Batch [2420] Speed: 32.53 samples/sec > INFO:root:Epoch[0] Batch [2440] Speed: 33.25 samples/sec > INFO:root:Epoch[0] Batch [2460] Speed: 34.40 samples/sec > INFO:root:Epoch[0] Batch [2480] Speed: 34.67 samples/sec > INFO:root:Epoch[0] Batch [2500] Speed: 34.40 samples/sec > INFO:root:Epoch[0] Batch [2520] Speed: 34.52 samples/sec > INFO:root:Epoch[0] Batch [2540] Speed: 34.53 samples/sec > INFO:root:Epoch[0] Batch [2560] Speed: 34.61 samples/sec > INFO:root:Epoch[0] Batch [2580] Speed: 34.70 samples/sec > INFO:root:Epoch[0] Batch [2600] Speed: 34.65 samples/sec > INFO:root:Epoch[0] Batch [2620] Speed: 34.56 samples/sec > INFO:root:Epoch[0] Batch [2640] Speed: 34.56 samples/sec > INFO:root:Epoch[0] Batch [2660] Speed: 34.41 samples/sec > INFO:root:Epoch[0] Batch [2680] Speed: 34.38 samples/sec > INFO:root:Epoch[0] Batch [2700] Speed: 34.21 samples/sec > INFO:root:Epoch[0] Batch [2720] Speed: 34.06 samples/sec > INFO:root:Epoch[0] Batch [2740] Speed: 34.12 samples/sec > INFO:root:Epoch[0] Batch [2760] Speed: 34.20 samples/sec > INFO:root:Epoch[0] Batch [2780] Speed: 33.93 samples/sec > INFO:root:Epoch[0] Batch [2800] Speed: 33.88 samples/sec > INFO:root:Epoch[0] Batch [2820] Speed: 33.88 samples/sec > INFO:root:Epoch[0] Batch [2840] Speed: 33.79 samples/sec > INFO:root:Epoch[0] Batch [2860] Speed: 34.14 samples/sec > INFO:root:Epoch[0] Batch [2880] Speed: 33.47 samples/sec > INFO:root:Epoch[0] Batch [2900] Speed: 33.59 samples/sec > ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
With regards, Apache Git Services