Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/22325 )

Change subject: IMPALA-13655: UPDATE redundantly accumulates memory in HDFS 
WRITER
......................................................................

IMPALA-13655: UPDATE redundantly accumulates memory in HDFS WRITER

When IcebergUpdateImpl created the table sink it didn't set
'inputIsClustered' to true. Therefore HdfsTableSink expected
random input and kept the output writers open for every partition,
which resulted in high memory consumption and potentially an
OOM error when the number of partitions are high.

Since we actually sort the rows before the sink we can set
'inputIsClustered' to true, which means HdfsTableSink can write
files one by one, because whenever it gets a row that belongs
to a new partition it knows that it can close the current output
writer, and open a new one.

Testing:
 - e2e regression test

Change-Id: I9bad335cc946364fc612e8aaf90858eaabd7c4af
Reviewed-on: http://gerrit.cloudera.org:8080/22325
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
---
M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test
2 files changed, 23 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/22325
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9bad335cc946364fc612e8aaf90858eaabd7c4af
Gerrit-Change-Number: 22325
Gerrit-PatchSet: 3
Gerrit-Owner: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>

Reply via email to