Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21235
Change subject: IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation ...................................................................... IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation This patch fixes a bug in the RowsPassedThrough metric within the query profile while using Streaming Aggregation. The issue is from the AddBatchStreaming() function's logic, where the number of rows in the output batch isn't necessarily initialized to 0, while the function uses num_rows() of the output batch directly to be the actual number of rows returned and passed through of this specific aggregator. This discrepancy can significantly impact the accuracy of the returned and passed through numbers, as well as the calculation of reduction rates during hash table expansion in Streaming Aggregation. Huge differences can be observed especially when using the rollup function. The solution is to calculate the actual number of rows added to the output batch within each round of the AddBatchStreaming() function. Tests: Passed exhaustive tests. Added a corresponding case in tpch-passthrough-aggregations.test. Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f --- M be/src/exec/grouping-aggregator.cc M testdata/workloads/tpch/queries/tpch-passthrough-aggregations.test 2 files changed, 27 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21235/1 -- To view, visit http://gerrit.cloudera.org:8080/21235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f Gerrit-Change-Number: 21235 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu <[email protected]>
