Taras Bobrovytsky has uploaded a new patch set (#3). Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
IMPALA-4787: Optimize APPX_MEDIAN() memory usage Before this change, ReservoirSample functions (such as APPX_MEDIAN()) allocated memory for 20,000 elements up front per grouping key. This caused inefficient memory usage for aggregations with many grouping keys. This patch fixes this by initially allocating memory for 20 elements. Once the buffer becomes full, we allocate room for 200 elements and copy the original buffer into the new one. We continue increasing the buffer size this way until the buffer has room for 20,000 elements as before. Testing: Ran BE and some relevant EE tests locally. No new tests were added, the existing tests should provide enough coverage. Memory benchmark: The following query was used to verify that this patch reduces memory usage: SELECT APPX_MEDIAN(ss_sold_date_sk) FROM tpcds.store_sales GROUP BY ss_customer_sk; Peak Mem in Agg nodes is reduced from 4.94 GB to 20.67 MB. Summary before: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------- 04:EXCHANGE 1 5.856ms 5.856ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE 3 3s721ms 3s789ms 14.82K 15.21K 4.94 GB 10.00 MB FINALIZE 02:EXCHANGE 3 139.276ms 157.753ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE 3 2s851ms 3s026ms 15.60K 15.21K 5.29 GB 10.00 MB STREAMING 00:SCAN HDFS 3 24.245ms 35.727ms 183.59K 183.59K 4.60 MB 384.00 MB tpcds.store_sales Summary after: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------- 04:EXCHANGE 1 1.869ms 1.869ms 14.82K 15.21K 0 -1.00 B UNPARTITIONED 03:AGGREGATE 3 34.225ms 37.978ms 14.82K 15.21K 20.67 MB 10.00 MB FINALIZE 02:EXCHANGE 3 860.854us 1.024ms 15.60K 15.21K 0 0 HASH(ss_customer_sk) 01:AGGREGATE 3 74.048ms 93.605ms 15.60K 15.21K 9.99 MB 10.00 MB STREAMING 00:SCAN HDFS 3 68.162ms 78.181ms 183.59K 183.59K 3.84 MB 384.00 MB tpcds.store_sales Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 --- M be/src/exprs/aggregate-functions-ir.cc M testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test 2 files changed, 141 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/6025/3 -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>