Yida Wu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21235


Change subject: IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in 
Streaming Aggregation
......................................................................

IMPALA-12960: Fix Incorrect RowsPassedThrough Metric in Streaming Aggregation

This patch fixes a bug in the RowsPassedThrough metric within the
query profile while using Streaming Aggregation. The issue is from
the AddBatchStreaming() function's logic, where the number of rows
in the output batch isn't necessarily initialized to 0, while the
function uses num_rows() of the output batch directly to be the
actual number of rows returned and passed through of this specific
aggregator. This discrepancy can significantly impact the accuracy
of the returned and passed through numbers, as well as the
calculation of reduction rates during hash table expansion in
Streaming Aggregation. Huge differences can be observed especially
when using the rollup function.

The solution is to calculate the actual number of rows added
to the output batch within each round of the AddBatchStreaming()
function.

Tests:
Passed exhaustive tests.
Added a corresponding case in tpch-passthrough-aggregations.test.

Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f
---
M be/src/exec/grouping-aggregator.cc
M testdata/workloads/tpch/queries/tpch-passthrough-aggregations.test
2 files changed, 27 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21235/1
--
To view, visit http://gerrit.cloudera.org:8080/21235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I59205a4b06824ee1607a25e906db1f96dc4eda9f
Gerrit-Change-Number: 21235
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu <[email protected]>

Reply via email to