Noemi Pap-Takacs has uploaded this change for review. ( http://gerrit.cloudera.org:8080/22325
Change subject: IMPALA-13655: UPDATE redundantly accumulates memory in HDFS WRITER ...................................................................... IMPALA-13655: UPDATE redundantly accumulates memory in HDFS WRITER When IcebergUpdateImpl created the table sink it didn't set 'inputIsClustered' to true. Therefore HdfsTableSink expected random input and kept the output writers open for every partition, which resulted in high memory consumption and potentially an OOM error when the number of partitions are high. Since we actually sort the rows before the sink we can set 'inputIsClustered' to true, which means HdfsTableSink can write files one by one, because whenever it gets a row that belongs to a new partition it knows that it can close the current output writer, and open a new one. Testing: - e2e regression test Change-Id: I9bad335cc946364fc612e8aaf90858eaabd7c4af --- M fe/src/main/java/org/apache/impala/analysis/IcebergUpdateImpl.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-update-partitions.test 2 files changed, 23 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/25/22325/1 -- To view, visit http://gerrit.cloudera.org:8080/22325 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I9bad335cc946364fc612e8aaf90858eaabd7c4af Gerrit-Change-Number: 22325 Gerrit-PatchSet: 1 Gerrit-Owner: Noemi Pap-Takacs <[email protected]>
