Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22192 )
Change subject: IMPALA-13598: OPTIMIZE redundantly accumulates memory in HDFS WRITER ...................................................................... IMPALA-13598: OPTIMIZE redundantly accumulates memory in HDFS WRITER When OptimizeStmt created the table sink it didn't set 'inputIsClustered' to true. Therefore HdfsTableSink expected random input and kept the output writers open for every partition, which resulted in high memory consumption and potentially an OOM error when the number of partitions are high. Since we actually sort the rows before the sink we can set 'inputIsClustered' to true, which means HdfsTableSink can write files one by one, because whenever it gets a row that belongs to a new partition it knows that it can close the current output writer, and open a new one. Testing: * added e2e test Change-Id: I8d451c50c4b6dff9433ab105493051bee106bc63 Reviewed-on: http://gerrit.cloudera.org:8080/22192 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/analysis/OptimizeStmt.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-optimize.test 2 files changed, 24 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/22192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8d451c50c4b6dff9433ab105493051bee106bc63 Gerrit-Change-Number: 22192 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
