zhangqianqiong has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/21845 )

Change subject: IMPALA-13385: Compact the response of ddl operations
......................................................................

IMPALA-13385: Compact the response of ddl operations

At present, when catalogd responds to DDL operations, it sends the entire
table object. This can lead to a massive transfer of table catalog when
dealing with the hive partitioned table. In one of our customer's clusters,
there is a hive partitioned table with over 4,000 columns, more than 20,000
partitions, and involving over 10 million hdfs files. When executing an
`ALTER TABLE ADD PARTITION` operation on this table, the catalog being
serialized for the table exceeds the java array size limit, resulting in
the following exception: `java.long.OutOfMemoryError: Requested array size
exceeds VM limit`.

To alleviate the issue, we can use TCompactProtocol instead of
TBinaryProtocol for thrift serialization. TCompactProtocol provides better
compaction for non-string types. To verify this, I conducted two tests.

   1. I built the 1k_col_tbl table using the scale_test_metadata script.
Since nearly all columns in this table are of string type. The test results
showed that the serialized size before compaction was 408M, while after
compaction, it was reduced to 369M, resulting in only a 9.5% reduction. This
confirms that TCompactProtocol does not significantly compact string data,
which aligns with our expectations.
   2. I conducted another test. Since in the 1k_col_tbl table, all data
columns except the id column were empty, these columns took up almost no
space during serialization. So, I modified the partition columns from string
type to int type and reconstructed the data. The test results showed that
the serialized size before compaction was 89M, and after compaction, it was
reduced to 49M, a 45% reduction. This demonstrates the advantage of
TCompactProtocol for compacting non-string data types.

Change-Id: Idea9313c7f1f1596f3620e60b08a99efc7fa0466
---
M be/src/catalog/catalog.cc
M be/src/rpc/jni-thrift-util.h
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
3 files changed, 33 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/21845/6
--
To view, visit http://gerrit.cloudera.org:8080/21845
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idea9313c7f1f1596f3620e60b08a99efc7fa0466
Gerrit-Change-Number: 21845
Gerrit-PatchSet: 6
Gerrit-Owner: zhangqianqiong <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: zhangqianqiong <[email protected]>

Reply via email to