Hello Xuebin Su, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22062 to look at the new patch set (#7). Change subject: IMPALA-13487: Add profile counters for memory allocation ...................................................................... IMPALA-13487: Add profile counters for memory allocation This patch adds some profile counters to identify memory-bound queries, i.e. query duration mostly spent in memory operations. The following counters are added: - Thread level page faults: TotalThreadsMinorPageFaults, TotalThreadsMajorPageFaults. - MemPool counters for tuple_mem_pool and aux_mem_pool of the scratch batch in columnar scanners: - ScratchBatchMemAlloclDuration - ScratchBatchMemAllocBytes - Duration in RowBatch::Reset() called in FragmentInstanceState::Exec: RowBatchResetTime. - Duration in HdfsColumnarScanner::GetCollectionMemory() which includes memory allocation for collection values and memcpy when doubling the tuple buffer: MaterializeCollectionGetMemTime. Here is an example of a memory-bound query: Fragment Instance - RowBatchResetTime: 343.997ms - TotalStorageWaitTime: 38.999ms - TotalThreadsInvoluntaryContextSwitches: 0 (0) - TotalThreadsMajorPageFaults: 0 (0) - TotalThreadsMinorPageFaults: 578.60K (578601) - TotalThreadsTotalWallClockTime: 9s411ms - TotalThreadsSysTime: 1s463ms - TotalThreadsUserTime: 1s850ms - TotalThreadsVoluntaryContextSwitches: 4.96K (4957) - TotalTime: 9s411ms ... HDFS_SCAN_NODE (id=0): - ScratchBatchMemAllocBytes: (Avg: 486.33 KB (498004) ; Min: 4.00 KB (4096) ; Max: 512.00 KB (524288) ; Sum: 1.19 GB (1274890240) ; Number of samples: 2560) - ScratchBatchMemAllocDuration: (Avg: 1.986ms ; Min: 0.000ns ; Max: 40.999ms ; Sum: 5s084ms ; Number of samples: 2560) - ScratchBatchMemFreeDuration: 0.000ns (Number of samples: 0) - DecompressionTime: 1s333ms - MaterializeCollectionGetMemTime: 4s910ms - MaterializeTupleTime: 6s722ms - ScannerIoWaitTime: 38.999ms - TotalRawHdfsOpenFileTime: 0.000ns - TotalRawHdfsReadTime: 315.998ms - TotalTime: 9s013ms The fragment instance took 9s411ms to finish. 343.997ms spent in resetting the final RowBatch. The majority of the time is spent in the scan node (9s013ms). Mostly it's DecompressionTime + MaterializeTupleTime + ScannerIoWaitTime + TotalRawHdfsReadTime. The majority is MaterializeTupleTime (6s722ms). ScratchBatchMemAllocDuration shows that invoking std::malloc() in materializing the scratch batches took 5s084ms overall. MaterializeCollectionGetMemTime shows that allocating memory for collections and copying memory in doubling the tuple buffer took 4s910ms. So materializing the collections took most of the time. The other observation is TotalThreadsTotalWallClockTime is much higher than TotalThreadsSysTime + TotalThreadsUserTime and there is a large number of TotalThreadsVoluntaryContextSwitches. So the thread is waiting for resources (e.g. lock) for a long duration. In the above case, it's waiting for locks in tcmalloc memory allocation (need off-cpu flame graph to reveal this). Implementation of MemPool counters Add optional MemPoolCounters owned by MemPool callers (e.g. scanner) so they can have longer life cycle than MemPools. Note that some counters are updated in the destructor of MemPool so need longer life cycle. MemPoolCounters is currently an optional parameter in the MemPool constructor. MemPool is widely used in the code base. Callers that don't need to track MemPool counters keep setting it as nullptr. Currently, only track MemPool counters of the scratch batch in columnar scanners. This patch also moves GetCollectionMemory() from HdfsScanner to HdfsColumnarScanner since it's only used by parquet and orc scanners. Tests - tested in manually reproducing the memory-bound queries - added e2e tests Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710 --- M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/scratch-tuple-batch.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/mem-pool.cc M be/src/runtime/mem-pool.h M be/src/util/impala-profile-tool.cc M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile.cc M tests/query_test/test_observability.py 12 files changed, 200 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/22062/7 -- To view, visit http://gerrit.cloudera.org:8080/22062 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I982315d96e6de20a3616f3bd2a2b4866d1ff4710 Gerrit-Change-Number: 22062 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Xuebin Su <x...@cloudera.com>