Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23154 )

Change subject: IMPALA-9846: Enable Aggregated Runtime Profile by Default
......................................................................


Patch Set 19:

(18 comments)

Sorry for the late response. I was on PTO for more than a week due to my health.

Thank you for the comments.

http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG@16
PS13, Line 16:  are substan
> I see 3 kind of changes in the patch, and it may be useful to split them to
Thank you for suggesting this. I will try to split the change into separate 
commits.

For rewriting row_regex as aggregate(SUM,..), we would need to manually search 
such row_regex and sum up the actual values for rewriting the tests. Both 
searching for these row_regex and manually summing these values will cause 
increasing delays for enabling aggregated runtime profile patch.

After enabling aggregated runtime profile, in later changes we can improve the 
tests to use aggregate instead of row_regex, as this will be a delaying, 
verbose task and is not required for enabling aggregated profile.

I can combine 1 + 2 into one change and 3 into another change. This way the 
tests have to run for traditional profile first, then with the next change, the 
tests will run for aggregated profile.

But, separating 1 and 2 would not be possible. As all the tests will break, 
when the profile output changes due to 1. Hence, 1 and 2 are inseparable for 
passing the tests successfully.

So, finally the change can be broken into two patches.
 A. 1 + 2 - For testing traditional profile mode as default
 B. 3 - For testing aggregated profile mode as default

Please let me know, if I have misunderstood this. Thank you again for the 
comments.


http://gerrit.cloudera.org:8080/#/c/23154/13//COMMIT_MSG@20
PS13, Line 20: by transforming instance-level counters into operator level 
arrays
             : or maps.
> It would be useful the list of common reasons why the change was needed.
Done. I have enlisted the reasons now.


http://gerrit.cloudera.org:8080/#/c/23154/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23154/18//COMMIT_MSG@17
PS18, Line 17: 'mt_dop' values.
> I would prefer just aggregated_profile.
Done. I have updated the name now.


http://gerrit.cloudera.org:8080/#/c/23154/13/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/23154/13/be/src/util/runtime-profile.cc@2320
PS13, Line 2320: total
> This seems like actually changing the text profile format (and also json in
Done. I have explained it now.


http://gerrit.cloudera.org:8080/#/c/23154/18/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/23154/18/be/src/util/runtime-profile.cc@2294
PS18, Line 2294:     result.total = std::accumulate(vals.begin(), vals.end(), 
0.0L);
> Total is not a good metric to present in many cases. We should insead print
I reverted to using 'total' as the preferred statistic. Please contemplate the 
following reasons.

My initial approach was also to implement 'count'. This implementation was 
inefficient, verbose and problematic at times.

1. Printing count instead of total for all counters would result in a very 
repetitive profile. Also, this would be unnecessarily too verbose for the user.

For example,
- BytesRead: count=3 (3) mean=81.31 KB (85258)
- RowsRead: count=3 (3) mean=2.00K (2005)
- BytesReadLocal: count=3 (3) mean=81.31 KB (85258)

In this way count would be repeated multiple times.

On the other hand, when using 'total' as a statistic, 'total bytes read' and 
'total memory consumed' are useful statistics.

2. Also majorly, with regard to "total" metric, it's not just "some" tests that 
need it.

It's actually more than 2500+ tests, which mandatorily require this and would 
fail without this.

We would be making an inefficient choice, if we were to repeatedly capture more 
regex groups(count and mean), while also adding unnecessary verbosity to 
profile.

3. Also, we can filter the type of values for which "total" can be printed.

i.e. time values would not have 'total'
     bytes and row counts would have 'total'

This was already implemented in one of the previous patches.

If out approach is to use 'num instances', which is already present in 
profile(at times not same as count), we would need to be in context, while 
searching and matching in that specific context.

This way, we would also be loosing the 10x faster, efficient regex searching, 
where we search all at once, instead of line by line.

For these reasons, I believe it would be better to use 'total' instead of 
'count' each time.


http://gerrit.cloudera.org:8080/#/c/23154/18/be/src/util/runtime-profile.cc@2320
PS18, Line 2320:   (*s) << prefix << "   - " << name << ": total="
> Looking at the test changes, tHis makes the output confusing. Stats should
As suggested in the previous comment...

We can filter the type of values for which aggregated "total" can be printed.

i.e. time values would not have 'total' metric aggregated
     while bytes and row counts would be aggregated into 'total'

In one of the previous patches, a similar method was already implemented once.


http://gerrit.cloudera.org:8080/#/c/23154/18/be/src/util/runtime-profile.cc@2397
PS18, Line 2397:       counter_json.AddMember("mean", stats.mean, 
document.GetAllocator());
> See above.
Done.

This mistakenly remained from one of the previous patches mentioned above. I 
have made it consistent now.


http://gerrit.cloudera.org:8080/#/c/23154/18/be/src/util/runtime-profile.cc@3355
PS18, Line 3355: }
> EXPANDED maybe better than UNAGGREGATED?
I believe, 'EXPANDED' might confuse users with 'EXTENDED' from the 
profile-verbosity attribute.

It would be easier for the user to see how UNAGGREGATED would be opposite of 
AGGREGATED.


http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/binary-type.test
File testdata/workloads/functional-query/queries/QueryTest/binary-type.test:

http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/binary-type.test@167
PS13, Line 167: aggregation(SUM, TotalKuduScanRoundTrips): 0
> This changes the semantics of the regexp, it is still correct though becaus
Done


http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test
File 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test:

http://gerrit.cloudera.org:8080/#/c/23154/13/testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test@544
PS13, Line 544: row_regex: Partition: year=2010 
/day=__HIVE_DEFAULT_PARTITION__\nNumModifiedRows: 1000
> I see several tests rewritten to use row_regex - what is the reason for thi
The same lines were trying to be matched or searched multiple times, these are 
not unique lines or searches.

In this case,
  "NumModifiedRows: 1000" was being repeatedly searched

   Although, the intention of the test was to match NumModifiedRows, when 
/day=__HIVE_DEFAULT_PARITION__.

This is the reason the lines were combined.

Mentioning the reason for this and other such test fixes would be too verbose 
for this large change. I'm sorry for not including them in the commit message.


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@601
PS13, Line 601: #     aggregation(function, field_name): expected_value
> This is the same as line 597
Done


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@624
PS13, Line 624: #      <field_name>: val.(<value>) ...
              : #
              : # Note: Fields witho
> I see several UNAGGREGATED in test files. What does this mean? I assumed th
I was told to mantain support for both aggregated and unaggregated profiles by 
@Kurt Deschler.

So, I have worked to maintain compatibility and successfully pass all the 2700+ 
tests for both aggregated and unaggregated profiles.


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@647
PS13, Line 647:     "value": 0,
              :     "subgroups": 0,
              :     "function_operator": None,
> Is there a reason for not allowing this besides being non useful?
To search multiple counter fields for aggregation and row_regex in a really 
efficient) all at once search(20-50x faster), we are already using multiple 
capture groups for this. So, additional capture groups would unnecessarily 
complicate the logic, while not adding functionality.


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@650
PS13, Line 650:     "expected_value": 0,
              :     "comparison_operator": None,
> This may make the change even bigger, but it would make more sense to me to
Done


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@664
PS13, Line 664: fication:
> typo
Done


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@698
PS13, Line 698:   if search_objec
> this looks like a debugging print()
Done. It was actually for testing purposes.


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@700
PS13, Line 700:   return search_object
              :
> Does to code actually support non-aggregated profiles? Does it need to supp
Both aggregated and non-aggregated profiles are supported and compatible with 
tests.

We only check and consider "Profile Type: AGGREGATED" to know if profile is 
aggregated. If this is not present, it means the one of the following 2 cases, 
where it is better to assume profile as non-aggregated.

1. Profile Type: UNAGGREGATED is present, so unaggregated profile
2. Profile Type is not present at all, so it is a traditional/aggregated 
profile from before

In both cases, we can only assume it is an unaggregated profile.


http://gerrit.cloudera.org:8080/#/c/23154/13/tests/common/test_result_verifier.py@723
PS13, Line 723:     if negate_regex:
> this looks like a debugging print()
Done



--
To view, visit http://gerrit.cloudera.org:8080/23154
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If41d6322361fba82c946efd614cc7d28cb1c36e8
Gerrit-Change-Number: 23154
Gerrit-PatchSet: 19
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Mon, 10 Nov 2025 10:48:06 +0000
Gerrit-HasComments: Yes

Reply via email to