jinhuang415 opened a new issue #10520: MXNet operator profile aggregate counter 
issue
URL: https://github.com/apache/incubator-mxnet/issues/10520
 
 
   It looks the operator counter of the aggregated output is the 2x of real 
value. 
   
   I used below gluon model (mainly copied from 
https://mxnet.incubator.apache.org/tutorials/gluon/gluon.html, contains 2 
convolutions for each forward pass) and did profiling, the aggregated 
convolution counter is 4 instead of 2, the counter doubled for other OPs as 
well.
   
   ```
   # import dependencies
   from __future__ import print_function
   import numpy as np
   import mxnet as mx
   import mxnet.ndarray as F
   import mxnet.gluon as gluon
   from mxnet.gluon import nn
   from mxnet import autograd
   
   class Net(gluon.Block):
       def __init__(self, **kwargs):
           super(Net, self).__init__(**kwargs)
           with self.name_scope():
               # layers created in name_scope will inherit name space
               # from parent layer.
               self.conv1 = nn.Conv2D(6, kernel_size=5)
               self.pool1 = nn.MaxPool2D(pool_size=(2,2))
               self.conv2 = nn.Conv2D(16, kernel_size=5)
               self.pool2 = nn.MaxPool2D(pool_size=(2,2))
               self.fc1 = nn.Dense(120)
               self.fc2 = nn.Dense(84)
               self.fc3 = nn.Dense(10)
   
       def forward(self, x):
           x = self.pool1(F.relu(self.conv1(x)))
           x = self.pool2(F.relu(self.conv2(x)))
           # 0 means copy over size from corresponding dimension.
           # -1 means infer size from the rest of dimensions.
           x = x.reshape((0, -1))
           x = F.relu(self.fc1(x))
           x = F.relu(self.fc2(x))
           x = self.fc3(x)
           return x
   
   net = Net()
   # Initialize on CPU. Replace with `mx.gpu(0)`, or `[mx.gpu(0), mx.gpu(1)]`,
   # etc to use one or more GPUs.
   net.collect_params().initialize(mx.init.Xavier(), ctx=mx.cpu())
   
   mx.profiler.set_config(aggregate_stats=True)
   mx.profiler.set_state('run')
   
   data = mx.nd.random_normal(shape=(10, 1, 32, 32))  # dummy data
   output = net(data)
   output.wait_to_read()
   
   print(mx.profiler.dumps())
   
   ```
   
   
   The output is below:
   
   ```
   [jinhuang@mlt-ace image-classification]$ python test_gluon.py
   
   Profile Statistics.
           Note that counter items are counter values and not time units.
   Device Storage
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   Memory: cpu/0                          65         535.0320           0.0240  
       726.1520         363.0640
   
   MXNET_C_API
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   MXNDArrayReshape64                      1           0.0140           0.0140  
         0.0140           0.0140
   MXNDArrayFree                          21           0.1100           0.0000  
         0.0290           0.0052
   MXAutogradMarkVariables                10           0.1080           0.0080  
         0.0180           0.0108
   MXNDArrayCreateEx                      10           0.0340           0.0020  
         0.0070           0.0034
   MXNDArrayGetDType                      15           0.0000           0.0000  
         0.0000           0.0000
   MXSymbolSetAttr                        47           0.0920           0.0010  
         0.0130           0.0020
   MXNDArrayGetContext                    13           0.0070           0.0000  
         0.0010           0.0005
   MXNDArrayWaitToRead                     1          12.0930          12.0930  
        12.0930          12.0930
   MXSymbolInferShape                      7           0.5700           0.0490  
         0.2320           0.0814
   MXNet C API Calls                     230           0.2300           0.0010  
         0.2300           0.1145
   MXNet C API Concurrency               459           0.0010           0.0000  
         0.0010           0.0005
   MXSymbolCreateAtomicSymbol               7           0.3780           0.0190 
          0.1540           0.0540
   MXAutogradSetIsRecording               10           0.0140           0.0000  
         0.0070           0.0014
   MXAutogradSetIsTraining                10           0.0040           0.0000  
         0.0010           0.0004
   MXImperativeInvokeEx                   52           3.7600           0.0150  
         1.7250           0.0723
   MXNDArrayGetShape                      25           0.0190           0.0000  
         0.0030           0.0008
   
   operator
   =================
   Name                          Total Count        Time (ms)    Min Time (ms)  
  Max Time (ms)    Avg Time (ms)
   ----                          -----------        ---------    -------------  
  -------------    -------------
   WaitForVar                              2           0.0080           0.0040  
         0.0040           0.0040
   FullyConnected                          6           0.7770           0.0590  
         0.2380           0.1295
   Pooling                                 4           0.5280           0.0890  
         0.1750           0.1320
   _zeros                                 20           0.6220           0.0020  
         0.1410           0.0311
   ResourceParallelRandomSetSeed               2          14.7190           
7.3580           7.3610           7.3595
   DeleteVariable                         40           0.1130           0.0010  
         0.0070           0.0028
   relu                                    8           0.2220           0.0070  
         0.0770           0.0278
   _random_normal                          2          10.8890           5.4430  
         5.4460           5.4445
   _random_uniform                        10           0.7370           0.0100  
         0.2740           0.0737
   _full                                  10           0.0350           0.0010  
         0.0080           0.0035
   Convolution                             4          12.5590           0.5260  
         5.7530           3.1398
   zeros_like                             20           0.3800           0.0030  
         0.1490           0.0190
   CopyCPU2CPU                            20           0.4690           0.0030  
         0.1530           0.0235
   Reorder                                 4           0.0110           0.0010  
         0.0050           0.0027
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to