Balazs Jeszenszky has posted comments on this change. ( http://gerrit.cloudera.org:8080/14776 )
Change subject: IMPALA-7550: Add documentation to profile counters ...................................................................... Patch Set 5: (28 comments) Looks nice. I haven't looked at the significance fields for individual counters yet. In general, all but DEBUG descriptions should avoid Impala-internal terminology as much as possible. The target audience will never want to open Impala's code, so referencing function names, etc. will be frustrating. Some are harder to word around than others (e.g. scan range), I skipped those for now. http://gerrit.cloudera.org:8080/#/c/14776/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14776/5//COMMIT_MSG@37 PS5, Line 37: 2. Profile counters are annotated with their stability: : * Stable counters - generally useful to understand query performance, : should only change rarely and if it does we'll make some effort to : notify users. E.g. BytesRead. : * Unstable but useful - useful to understand query performance, but : subject to change, particularly if the implementation changes. E.g. : RowBatchQueuePutWaitTime, MaterializeTupleTimer : * Debugging counters - generally not useful to users of Impala, the main : use case is low-level debugging. Can be hidden to reduce noise for most : consumers of profiles. : : 3. Profile counters are also annotated with their significance to users. : * Critical level counters - always useful on measuring query performance and status. : Counters that everyone are interested. : * High level counters - generally interesting counters. Most of the users will be : interested and all the developers are very interested. : * Medium level counters - somehow interesting counters to monitor. It will probably be : interesting under some circumstance. Lot of developers are interested. : * Low level counters - not interesting to users. Should be useful for developers : to debug only. Please simplify this to what's in the code - I think the descriptions in Significance are a good explanation of the different levels. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@70 PS5, Line 70: by Disk I/O threads in HDFS read operations. For example, if we have 3 reading " : "threads and each spent 1 sec, this counter will report 3 sec.") Instead of including an explanation in individual description (but not in e.g. the next one), I'd stick to a common language that's clear enough by itself for all 'wall clock timers over multiple threads' descriptions. Maybe 'Aggregate wall clock time across all Disk I/O threads...'? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@79 PS5, Line 79: concept of \"scan range\", including HDFS and Kudu. Is this included in profiles where it's irrelevant? If no, remove this to avoid confusion. If yes, it'd be better to enumerate the full list since 'scan range' is not a user-facing term. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@85 PS5, Line 85: disk queue Maybe '...remote data source...'? Disk queue is not a term users will be familiar with. Or, again, just enumerate. Is HDFS remote read overall a single disk queue, or individual remote read targets? Same for S3. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@88 PS5, Line 88: Higher values Than what? - Is there a point of comparison we can provide? For example, 'values close to the number of disks accessed'? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@89 PS5, Line 89: thread scan http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@90 PS5, Line 90: because of I don't think we should speculate on root causes - there can be many reasons for this. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@121 PS5, Line 121: traditional HDFS scan nodes and the scan " : "node total time for the MT_DOP > 1 scan nodes Isn't this the same as 'HDFS scans'? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@122 PS5, Line 122: Low values show Don't scanner threads shut down if there's no work to do? IIUC, slow reads result in low ScannerIoWaitTime and also low AverageNumScannerThreads. If that's correct, I'd just focus on interpreting high values. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@125 PS5, Line 125: Note that if CPU load is high, this " : "can include time that replace: Note that this includes the time when http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@130 PS5, Line 130: The scan node tracks the number of bytes read for each column it " : "processes, and when the scan node is closed, it updates these counters with the " : "size of each column remove? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@136 PS5, Line 136: The scan node tracks the number of bytes read for each column it " : "processes, and when the scan node is closed, it updates these counters with the " : "size of each column. remove? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@57 PS5, Line 57: Provided as a counter as well as a time series that samples the" : " counter I'd leave this to the BytesReadSeries description. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@58 PS5, Line 58: Only implemented for scan node subclasses that expose the bytes read, " : "e.g. HDFS and HBase If this refers to the time series version, move there. If the time series is not included where it can't be populated, remove entirely. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@63 PS5, Line 63: UNSTABLE STABLE_LOW? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@66 PS5, Line 66: read from disks. If this is the only scan executing, ideally this will " : "approach the maximum bandwidth supported by the disks. This also includes time spent in the scanners, ie. applying predicates, making this not directly dependent on read speed. I'd just say '..aggregate rate data is scanned' and leave it at that. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@74 PS5, Line 74: This is at most the number of scan ranges More than one scanner thread can be spun up to process data from a single scan range, so this counter is not capped at scan range count. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@85 PS5, Line 85: Wall clock time that the " : "scanner threads Aggregate wall clock time across all scanner threads [...] http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@94 PS5, Line 94: "This should be created in Open and stopped when all the scanner threads are done." remove? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@88 PS5, Line 88: in exchange nodes I'd be hand wavy and add '(across the network)' to help interpret this without having to understand exchanges. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@91 PS5, Line 91: sent clarify: sent 'across the network' (exchange), or just returned further up the tree? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@93 PS5, Line 93: sent clarify, as above http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@97 PS5, Line 97: i.e. the selectivity over " : "all fragment instances that had a scan node in their plan." If both counters' descriptions are correct, I think this is not true. A scan on the right side of a broadcast join will multiply the bytes returned by the scan with the number of hosts for TotalScanByteSent, potentially having a larger TotalScanByteSent than the amount of bytes read by the scan. http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@104 PS5, Line 104: Updated for each Backend in 'UpdateBackendExecStatus' when " : "'ApplyExecStatusReport' returns true. remove? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@105 PS5, Line 105: Only valid after InitBackendStates() is " : "called Can this be rephrased in less specific terms, e.g. 'after all backends have started executing', or something similar? http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/util/runtime-profile-counters.h@146 PS5, Line 146: sable nit: stable http://gerrit.cloudera.org:8080/#/c/14776/5/www/profile_docs.tmpl File www/profile_docs.tmpl: http://gerrit.cloudera.org:8080/#/c/14776/5/www/profile_docs.tmpl@29 PS5, Line 29: <th>Name</th> : <th>Stability</th> : <th>Significance</th> : <th>Unit</th> : <th>Description</th> : </tr> : </thead> : <tbody> : {{#profile_docs}} : <tr> : <td> : {{name}} : </td> : <td> : {{significance}} : </td> : <td> : {{unit}} : </td> : <td> : {{description}} : </td> Looks like stability and significance needs to be merged in the header. Please add descriptions of the stability-significance merged field (expose the comments from the enum as a second field?). http://gerrit.cloudera.org:8080/#/c/14776/5/www/query_profile.tmpl File www/query_profile.tmpl: http://gerrit.cloudera.org:8080/#/c/14776/5/www/query_profile.tmpl@40 PS5, Line 40: <h4> : <a style="font-size:16px;" : href="{{ __common__.host-url }}/profile_docs">Profile Documentation : </a> : </h4> The profile counters should be referenced from outside the context of a single profile since they are useful in general and most people don't use the webUI to browse profiles. It'd be better in common-header or directly under /queries. -- To view, visit http://gerrit.cloudera.org:8080/14776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc03faddb27754001290bb6d899840e2cbe7ccb7 Gerrit-Change-Number: 14776 Gerrit-PatchSet: 5 Gerrit-Owner: Jiawei Wang <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Balazs Jeszenszky <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jiawei Wang <[email protected]> Gerrit-Reviewer: Jiawei Wang <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 05 Dec 2019 14:30:49 +0000 Gerrit-HasComments: Yes
