Aleksandr Efimov has posted comments on this change. ( http://gerrit.cloudera.org:8080/23154 )
Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default ...................................................................... Patch Set 30: (4 comments) http://gerrit.cloudera.org:8080/#/c/23154/30/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/23154/30/be/src/util/runtime-profile.cc@2327 PS30, Line 2327: bool print_total = !(unit_ == TUnit::TIME_NS || unit_ == TUnit::TIME_US || This still prints `total=` for derived counters such as `BYTES_PER_SECOND`, `UNIT_PER_SECOND`, and `DOUBLE_VALUE` ratios. Summing rates or ratios across instances is not meaningful and can be actively misleading in the aggregated profile. I think `total` should be limited to counters where summation is semantically valid, either by excluding these units here or by carrying explicit counter-category metadata for summable vs non-summable counters. http://gerrit.cloudera.org:8080/#/c/23154/30/be/src/util/runtime-profile.cc@2336 PS30, Line 2336: if (verbosity <= Verbosity::LEGACY) { This changes the legacy text profile format as well. With `--aggregated_profile=false`, averaged counters that pass `print_total` now emit `total=...` and return before printing the old legacy shape. That seems risky for compatibility with existing tooling that parses non-aggregated text profiles. Can we keep the legacy output unchanged and only add the new `total` field for the aggregated/default profile format? http://gerrit.cloudera.org:8080/#/c/23154/30/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: http://gerrit.cloudera.org:8080/#/c/23154/30/tests/common/test_result_verifier.py@658 PS30, Line 658: if re.search(r"[^\\]\([^\?]", pattern): This check does not catch capturing groups at the beginning of the pattern, e.g. `row_regex: (Profile Type): AGGREGATED`. Such a regex is accepted even though the comment says capturing groups are disallowed, and it can corrupt the group indexing used later for aggregation matches. I was able to make `verify_runtime_profile()` crash with a `TypeError` by placing such a `row_regex` before an `aggregation(...)` expectation. This should reject any capturing group, including one at position 0. http://gerrit.cloudera.org:8080/#/c/23154/30/tests/common/test_result_verifier.py@685 PS30, Line 685: pattern = r"{0}: total=\d+(?:.\d+)? ?(?:[KMB]+)? \((\d+(?:.\d+)?)\)".format(field) This regex only allows `[KMB]+` as the human-readable unit, so it does not match counters printed as `GB` or `TB`. For example, `BytesRead: total=1.00 GB (1073741824)` fails to satisfy `aggregation(SUM, BytesRead): 1073741824`. Since the exact value is already in parentheses, the matcher should either ignore the display unit entirely or allow the full set of units emitted by `PrettyPrinter`. -- To view, visit http://gerrit.cloudera.org:8080/23154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8 Gerrit-Change-Number: 23154 Gerrit-PatchSet: 30 Gerrit-Owner: Surya Hebbar <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Aleksandr Efimov <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Fri, 19 Jun 2026 15:25:00 +0000 Gerrit-HasComments: Yes
