zhangqianqiong has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/21845 )
Change subject: IMPALA-13385: Compact the response of ddl operations ...................................................................... IMPALA-13385: Compact the response of ddl operations At present, when catalogd responds to DDL operations, it sends the entire table object. This can lead to a massive transfer of table catalog when dealing with the hive partitioned table. In one of our customer's clusters, there is a hive partitioned table with over 4,000 columns, more than 20,000 partitions, and involving over 10 million hdfs files. When executing an `ALTER TABLE ADD PARTITION` operation on this table, the catalog being serialized for the table exceeds the java array size limit, resulting in the following exception: `java.long.OutOfMemoryError: Requested array size exceeds VM limit`. To alleviate the issue, we can use TCompactProtocol instead of TBinaryProtocol for thrift serialization. TCompactProtocol provides better compaction for non-string types. To verify this, I conducted two tests. 1. I built the 1k_col_tbl table using the scale_test_metadata script. Since nearly all columns in this table are of string type. The test results showed that the serialized size before compaction was 408M, while after compaction, it was reduced to 369M, resulting in only a 9.5% reduction. This confirms that TCompactProtocol does not significantly compact string data, which aligns with our expectations. 2. I conducted another test. Since in the 1k_col_tbl table, all data columns except the id column were empty, these columns took up almost no space during serialization. So, I modified the partition columns from string type to int type and reconstructed the data. The test results showed that the serialized size before compaction was 89M, and after compaction, it was reduced to 49M, a 45% reduction. This demonstrates the advantage of TCompactProtocol for compacting non-string data types. Change-Id: Idea9313c7f1f1596f3620e60b08a99efc7fa0466 --- M be/src/catalog/catalog.cc M be/src/rpc/jni-thrift-util.h M fe/src/main/java/org/apache/impala/service/JniCatalog.java 3 files changed, 33 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/21845/6 -- To view, visit http://gerrit.cloudera.org:8080/21845 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idea9313c7f1f1596f3620e60b08a99efc7fa0466 Gerrit-Change-Number: 21845 Gerrit-PatchSet: 6 Gerrit-Owner: zhangqianqiong <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: zhangqianqiong <[email protected]>
