Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17654 )
Change subject: IMPALA-10627: Use standard parquet-related Iceberg table properties ...................................................................... Patch Set 4: (3 comments) The change looks great! I only had minor comments. http://gerrit.cloudera.org:8080/#/c/17654/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc File be/src/exec/parquet/hdfs-parquet-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/17654/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@1247 PS4, Line 1247: uint64_t HdfsParquetTableWriter::default_block_size() const { : int64_t block_size = 0; : if (state_->query_options().__isset.parquet_file_size && : state_->query_options().parquet_file_size > 0) { : // If the user specified a value explicitly, use it. InitNewFile() will verify that : // the actual file's block size is sufficient. : block_size = state_->query_options().parquet_file_size; : } else if (table_desc_->IsIcebergTable() && : table_desc_->IcebergParquetRowGroupSize() > 0) { : // If the user specified a value explicitly, use it. InitNewFile() will verify that : // the actual file's block size is sufficient. : block_size = table_desc_->IcebergParquetRowGroupSize(); : } else { : block_size = HDFS_BLOCK_SIZE; : // Blocks are usually HDFS_BLOCK_SIZE bytes, unless there are many columns, in : // which case a per-column minimum kicks in. : block_size = max(block_size, MinBlockSize(columns_.size())); : } : // HDFS does not like block sizes that are not aligned : return BitUtil::RoundUp(block_size, HDFS_BLOCK_ALIGNMENT); : } : : int64_t HdfsParquetTableWriter::default_plain_page_size() const { : int64_t plain_page_size = 0; : if (table_desc_->IsIcebergTable()) { : plain_page_size = table_desc_->IcebergParquetPlainPageSize(); : } : : if (plain_page_size <= 0) plain_page_size = DEFAULT_DATA_PAGE_SIZE; : return plain_page_size; : } : : int64_t HdfsParquetTableWriter::dict_page_size() const { : int64_t dict_page_size = 0; : if (table_desc_->IsIcebergTable()) { : dict_page_size = table_desc_->IcebergParquetDictPageSize(); : } : : if (dict_page_size <= 0) dict_page_size = DEFAULT_DATA_PAGE_SIZE; : return dict_page_size; : } Optional: We could have a data member for each, e.g. default_block_size_, dict_page_size_, etc. And have Configure(), ConfigureForIceberg() methods that would set these values accordingly. This way the Iceberg-related code parts could be separated from the common code parts. Also, we could easily check the values during debugging. http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@96 PS4, Line 96: coded codec http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@191 PS4, Line 191: if (IcebergTable.PARQUET_COMPRESSION_CODEC.equals(propKey)) return true; : if (IcebergTable.PARQUET_COMPRESSION_LEVEL.equals(propKey)) return true; : if (IcebergTable.PARQUET_ROW_GROUP_SIZE.equals(propKey)) return true; : if (IcebergTable.PARQUET_PLAIN_PAGE_SIZE.equals(propKey)) return true; : if (IcebergTable.PARQUET_DICT_PAGE_SIZE.equals(propKey)) return true; I think these properties should be stored at the Iceberg table level as well. -- To view, visit http://gerrit.cloudera.org:8080/17654 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23 Gerrit-Change-Number: 17654 Gerrit-PatchSet: 4 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Wed, 14 Jul 2021 12:50:51 +0000 Gerrit-HasComments: Yes
