Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/23154 )
Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default ...................................................................... Patch Set 13: (13 comments) http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG@16 PS13, Line 16: this change, I see 3 kind of changes in the patch, and it may be useful to split them to separate commits: 1. changing text/json output by adding total + the info that the profile is aggregated 2. changing tests to expect the new profile format 3. changing the default 2 is purely an internal test change, while 1 and 3 are actual product changes (maybe 1 is not, as aggregated profiles were considered experimental) The tests could switch to using aggregated profiles without changing the default for the flag - there are a few flags where the test cluster overrides the default value. (I am not sure whether splitting 2 and 3 is useful) A reason why splitting 1 and 2+3 would be useful is simplifying backporting - I can imagine 1. to be backported if it turns out that someone already uses aggregated profiles to avoid producing different outputs, while backporting 2 would be a much harder due to its size. Another thing worth splitting is that lot of test changes seem to be only needed because the test writer didn't use aggregate(SUM, ...), while it would have made the tests less fragile and allow running the same test with aggregated and legacy profiles. These rewrites could be merged before the rest of the patch. http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG@20 PS13, Line 20: The following tests are to be modified to support the usage of : aggregated runtime profile - It would be useful the list of common reasons why the change was needed. http://gerrit.cloudera.org:8080/#/c/23154/13/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/23154/13/be/src/util/runtime-profile.cc@2320 PS13, Line 2320: total This seems like actually changing the text profile format (and also json in another function). Can you mention the format changes in commit message? http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/binary-type.test File testdata/workloads/functional-query/queries/QueryTest/binary-type.test: http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/binary-type.test@167 PS13, Line 167: !row_regex:.*TotalKuduScanRoundTrips: total=1.* This changes the semantics of the regexp, it is still correct though because this specific table only has a single file, so there will be always just one instance scanning it. I think that all 3 should be rewritten to expect 0 as the sum of TotalKuduScanRoundTrips (note that I wrote this test, and now I see that my solution for checking this counter was not that good) http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test File testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test: http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test@544 PS13, Line 544: row_regex: Partition: year=2010 /day=__HIVE_DEFAULT_PARTITION__\nNumModifiedRows: 1000 I see several tests rewritten to use row_regex - what is the reason for this? Can you mention it in the commit message? http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py File tests/common/test_result_verifier.py: http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@601 PS13, Line 601: # Special syntax for basic aggregation over fields in the runtime profile. This is the same as line 597 http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@624 PS13, Line 624: class CounterType(Enum): : AGGREGATED = 1 # Aggregated / Averaged : UNAGGREGATED = 2 I see several UNAGGREGATED in test files. What does this mean? I assumed that all tests use aggregated profiles from this patch http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@647 PS13, Line 647: if re.search(r"[^\\]\([^\?]", pattern): : assert False, ("Capturing groups are not allowed(use non-capturing groups)" : + "\nInvalid row regex specification: %s" % row_string) Is there a reason for not allowing this besides being non useful? http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@650 PS13, Line 650: # Remove .* and .*? from beginning and ending : pattern = re.sub(r"^\.\*\??|\.\*\??$|^\.\+\??|\.\+\??$", "", pattern) This may make the change even bigger, but it would make more sense to me to throw an error in this case, which would warn the contributor to change the test. Silently ignoring it can lead to people continuing writing such regexps. http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@664 PS13, Line 664: aggregaton typo http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@698 PS13, Line 698: print(expected) this looks like a debugging print() http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@700 PS13, Line 700: if "Profile Type: AGGREGATED" in actual: : agg_profile = True Does to code actually support non-aggregated profiles? Does it need to support it? http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@723 PS13, Line 723: print(united_regex) this looks like a debugging print() -- To view, visit http://gerrit.cloudera.org:8080/23154 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8 Gerrit-Change-Number: 23154 Gerrit-PatchSet: 13 Gerrit-Owner: Surya Hebbar <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Sat, 11 Oct 2025 16:43:44 +0000 Gerrit-HasComments: Yes
