Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19391 )
Change subject: IMPALA-11812: Deduplicate column schema in hmsPartitions ...................................................................... IMPALA-11812: Deduplicate column schema in hmsPartitions A list of HMS Partitions will be created in many workloads in catalogd, e.g. table loading, bulk altering partitions by ComputeStats or AlterTableRecoverPartitions, etc. Currently, each of hmsPartition hold a unique list of column schema, i.e. a List<FieldSchema>. This results in lots of FieldSchema instances if the table is wide and lots of partitions need to be loaded/operated. Though the strings of column names and comments are interned, the FieldSchema objects could still occupy the majority of the heap. See the histogram in JIRA description. In reality, the hmsPartition instances of a table can share the table-level column schema since Impala doesn't respect the partition level schema. This patch replaces column list in StorageDescriptor of hmsPartitions with the table level column list to remove the duplications. Also add some progress logs in batch HMS operations, and avoid misleading logs when event-processor is disabled. Tests: - Ran exhaustive tests - Add tests on wide table operations that hit OOM errors without this fix. Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Reviewed-on: http://gerrit.cloudera.org:8080/19391 Reviewed-by: Aman Sinha <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M tests/common/custom_cluster_test_suite.py A tests/custom_cluster/test_wide_table_operations.py 9 files changed, 212 insertions(+), 45 deletions(-) Approvals: Aman Sinha: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/19391 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I511ecca0ace8bea4c24a19a54fb0a75390e50c4d Gerrit-Change-Number: 19391 Gerrit-PatchSet: 8 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]>
