Aleksandr Efimov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23154 )

Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default
......................................................................


Patch Set 30:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/23154/30/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/23154/30/be/src/util/runtime-profile.cc@2327
PS30, Line 2327:   bool print_total = !(unit_ == TUnit::TIME_NS || unit_ == 
TUnit::TIME_US ||
This still prints `total=` for derived counters such as `BYTES_PER_SECOND`, 
`UNIT_PER_SECOND`, and `DOUBLE_VALUE` ratios.
Summing rates or ratios across instances is not meaningful and can be actively 
misleading in the aggregated profile.
I think `total` should be limited to counters where summation is semantically 
valid, either by excluding these units here or by carrying explicit 
counter-category metadata for summable vs non-summable counters.


http://gerrit.cloudera.org:8080/#/c/23154/30/be/src/util/runtime-profile.cc@2336
PS30, Line 2336:   if (verbosity <= Verbosity::LEGACY) {
This changes the legacy text profile format as well. With 
`--aggregated_profile=false`, averaged counters that pass `print_total` now 
emit `total=...` and return before printing the old legacy shape.
That seems risky for compatibility with existing tooling that parses 
non-aggregated text profiles.
Can we keep the legacy output unchanged and only add the new `total` field for 
the aggregated/default profile format?


http://gerrit.cloudera.org:8080/#/c/23154/30/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

http://gerrit.cloudera.org:8080/#/c/23154/30/tests/common/test_result_verifier.py@658
PS30, Line 658:     if re.search(r"[^\\]\([^\?]", pattern):
This check does not catch capturing groups at the beginning of the pattern, 
e.g. `row_regex: (Profile Type): AGGREGATED`.
Such a regex is accepted even though the comment says capturing groups are 
disallowed, and it can corrupt the group indexing used later for aggregation 
matches.
I was able to make `verify_runtime_profile()` crash with a `TypeError` by 
placing such a `row_regex` before an `aggregation(...)`
expectation.
This should reject any capturing group, including one at position 0.


http://gerrit.cloudera.org:8080/#/c/23154/30/tests/common/test_result_verifier.py@685
PS30, Line 685:       pattern = r"{0}: total=\d+(?:.\d+)? ?(?:[KMB]+)? 
\((\d+(?:.\d+)?)\)".format(field)
This regex only allows `[KMB]+` as the human-readable unit, so it does not 
match counters printed as `GB` or `TB`.
For example, `BytesRead: total=1.00 GB (1073741824)` fails to satisfy 
`aggregation(SUM, BytesRead): 1073741824`. Since the exact value is already in 
parentheses, the matcher should either ignore the display unit entirely or 
allow the full set of units emitted by `PrettyPrinter`.



--
To view, visit http://gerrit.cloudera.org:8080/23154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8
Gerrit-Change-Number: 23154
Gerrit-PatchSet: 30
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aleksandr Efimov <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Fri, 19 Jun 2026 15:25:00 +0000
Gerrit-HasComments: Yes

Reply via email to