Jiawei Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14776 )

Change subject: IMPALA-7550: Add documentation to profile counters
......................................................................


Patch Set 6:

(27 comments)

Hi Balazs,

Thanks so much for help correct the profile counters descriptions!

That's really helpful! Agree that we need to keep code out of stable counters 
descriptions. Unfortunately, I only have so little knowledge on the scan 
counters so that a lot of counters I am not sure how to fix the description. 
Ask @Tim Armstrong and @Lars to help take a look on that.

Also, if you have strong options how to update the counters. Please feel free 
to let me know. Have already applied a few suggestions you made.

Thanks
Jiawei

http://gerrit.cloudera.org:8080/#/c/14776/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14776/5//COMMIT_MSG@37
PS5, Line 37: 2. Profile counters are also annotated with their significance to 
users.
            : * STABLE_HIGH - High level and stable counters, always useful on 
measuring
            : query performance and status. Counters that everyone is 
interested. should
            : rarely change and if it does we will make some effort to notify 
users.
            :
            : * STABLE_LOW - Low level and stable counters. Interesting 
counters to monitor
            :  and analyze by machine. It will probably be interesting under 
some
            :  circumstance for users. Lots of developers are interested.
            :
            : * Unstable - Unstable but useful. Useful to understand query 
performance,
            : but subject to change, particularly if the implementation changes.
            : E.g. MaterializeTupleTimer
            :
            : * Debug -  Debugging counters. Generally not useful to users of 
Impala,
            :  the main use case is low-level debugging. Can be hidden to 
reduce noise
            :  for most consumers of profiles.
            :
            : 3. We have around 250 counters. This commit did the replacement in
            : scan-node and hdfs-scan-node-base and coordinator.
            :
> Please simplify this to what's in the code - I think the descriptions in Si
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@70
PS5, Line 70: across all Disk I/O threads in HDFS read operations.");
            : PROFILE_DEFINE_TIMER(TotalRawHdfsOpenFileTime, STABLE_LOW, "The 
tota
> Instead of including an explanation in individual description (but not in e
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@79
PS5, Line 79: lectionItemsRead, STABLE_LOW, TUnit::UNIT,
> Is this included in profiles where it's irrelevant? If no, remove this to a
Not sure about it.
@Lars and @Tarmstrong maybe can help to take a look?


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@85
PS5, Line 85: _DEFINE_SA
> Maybe '...remote data source...'? Disk queue is not a term users will be fa
@Lars and @Tarmstrong


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@88
PS5, Line 88: ty of the sys
> Than what? - Is there a point of comparison we can provide? For example, 'v
Not sure... if someone have better knowledge than this might want to fix it...


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@89
PS5, Line 89:  the s
> scan
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@90
PS5, Line 90: it::BYTES,
> I don't think we should speculate on root causes - there can be many reason
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@121
PS5, Line 121:  the scanner thread was ready to process "
             :     "the data. High values show that scanner threa
> Isn't this the same as 'HDFS scans'?
IDK to be honest... @Tarmstrong and @Lars


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@125
PS5, Line 125: pressedBytesReadPerColumn, STABLE_LOW,
             :     TUnit::BYTES, "Stats a
> replace: Note that this includes the time when
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@130
PS5, Line 130: , "Stats about the number of compressed bytes read per column. "
             :     "Each sample in the counter is the size of a single column 
that is scanned by the "
             :     "scan node.");
> remove?
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/hdfs-scan-node-base.cc@136
PS5, Line 136:  of data cache partially hit");
             : PROFILE_DEFINE_COUNTER(DataCacheMissCount, STABLE_HIGH, 
TUnit::UNIT,
             :     "Total count of data
> remove?
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@57
PS5, Line 57: Only implemented for scan node subclasses that expose "
            :     "the byte
> I'd leave this to the BytesReadSeries description.
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@58
PS5, Line 58: read, e.g. HDFS and HBase.");
            : PROFILE_DEFINE_COUNTER(R
> If this refers to the time series version, move there. If the time series i
Not sure about this. @Lars or @Tarmstrong might have deeper understanding on 
this


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@63
PS5, Line 63: me that
> STABLE_LOW?
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@66
PS5, Line 66: e series of BytesRead that samples the BytesRead counter.");
            : PROFILE_DEFINE_TIMER(MaterializeTupleTime, UNSTABLE, "Wall
> This also includes time spent in the scanners, ie. applying predicates, mak
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@74
PS5, Line 74: er of scanner threads can fluctuate during
> More than one scanner thread can be spun up to process data from a single s
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@85
PS5, Line 85: , TUnit::BYTES,
            :     "Peak memory con
> Aggregate wall clock time across all scanner threads [...]
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/exec/scan-node.cc@94
PS5, Line 94: t string ScanNode::SCANNER_THREAD_COUNTERS_PREFIX = 
"ScannerThreads";
> remove?
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@88
PS5, Line 88: ry in exchange no
> I'd be hand wavy and add '(across the network)' to help interpret this with
Done.
Not sure if I understand this correctly. Please help to check this.


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@91
PS5, Line 91:  of
> clarify: sent 'across the network' (exchange), or just returned further up
Done, as above


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@93
PS5, Line 93: R(To
> clarify, as above
Done, as above.


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@97
PS5, Line 97: OUBLE_VALUE,
            :     "The ratio between TotalScanByteSent and TotalBytesRead, i.e
> If both counters' descriptions are correct, I think this is not true. A sca
I only have limited knowledge of this part... So I am not sure if there is a 
misunderstanding or a bug. Maybe @Lars or @Tarmstrong can help to take a look?


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@104
PS5, Line 104: mCompletedBackends, STABLE_HIGH, TUnit::UNIT,"The number of "
             :     "completed backends. Only valid after
> remove?
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/runtime/coordinator.cc@105
PS5, Line 105: ll backends have started executing. "
             :     "Does n
> Can this be rephrased in less specific terms, e.g. 'after all backends have
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/14776/5/be/src/util/runtime-profile-counters.h@146
PS5, Line 146: stabl
> nit: stable
Done


http://gerrit.cloudera.org:8080/#/c/14776/5/www/profile_docs.tmpl
File www/profile_docs.tmpl:

http://gerrit.cloudera.org:8080/#/c/14776/5/www/profile_docs.tmpl@29
PS5, Line 29: <th>Name</th>
            :           <th>Significance</th>
            :           <th>Unit</th>
            :           <th>Description</th>
            :         </tr>
            :       </thead>
            :       <tbody>
            :         {{#profile_docs}}
            :         <tr>
            :           <td>
            :           {{name}}
            :           </td>
            :           <td>
            :           {{significance}}
            :           </td>
            :           <td>
            :           {{unit}}
            :           </td>
            :           <td>
            :           {{description}}
            :           </td>
            :         </tr>
> Looks like stability and significance needs to be merged in the header. Ple
Thanks for catching that! I forget to delete in the header.

Not sure what do you mean by adding the descriptions? The description of 
significance level? If so, where do we expose it in the web UI page.


http://gerrit.cloudera.org:8080/#/c/14776/5/www/query_profile.tmpl
File www/query_profile.tmpl:

http://gerrit.cloudera.org:8080/#/c/14776/5/www/query_profile.tmpl@40
PS5, Line 40: iv>
            :
            : <pre>{{profile}}</pre>
            :
            : <script>
> The profile counters should be referenced from outside the context of a sin
Moving it under /queries page.

https://drive.google.com/open?id=12VsFPdJbXNdkIpXgHB0ez85CPSYlGFif

But I feel the description is specific for query profile so it should be 
located somewhere more related to that? Not sure if it is a good place...

Keep it in /query_profile probably makes more sense?
https://drive.google.com/open?id=1aAAu8abV4k2yu75qL9zit1u3Q0JJwsaZ

Need options from others...



--
To view, visit http://gerrit.cloudera.org:8080/14776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idc03faddb27754001290bb6d899840e2cbe7ccb7
Gerrit-Change-Number: 14776
Gerrit-PatchSet: 6
Gerrit-Owner: Jiawei Wang <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Balazs Jeszenszky <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jiawei Wang <[email protected]>
Gerrit-Reviewer: Jiawei Wang <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 05 Dec 2019 20:28:30 +0000
Gerrit-HasComments: Yes

Reply via email to