Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17654 )

Change subject: IMPALA-10627: Use standard parquet-related Iceberg table 
properties
......................................................................


Patch Set 4:

(3 comments)

The change looks great! I only had minor comments.

http://gerrit.cloudera.org:8080/#/c/17654/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17654/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@1247
PS4, Line 1247: uint64_t HdfsParquetTableWriter::default_block_size() const {
              :   int64_t block_size = 0;
              :   if (state_->query_options().__isset.parquet_file_size &&
              :       state_->query_options().parquet_file_size > 0) {
              :     // If the user specified a value explicitly, use it. 
InitNewFile() will verify that
              :     // the actual file's block size is sufficient.
              :     block_size = state_->query_options().parquet_file_size;
              :   } else if (table_desc_->IsIcebergTable() &&
              :       table_desc_->IcebergParquetRowGroupSize() > 0) {
              :     // If the user specified a value explicitly, use it. 
InitNewFile() will verify that
              :     // the actual file's block size is sufficient.
              :     block_size = table_desc_->IcebergParquetRowGroupSize();
              :   } else {
              :     block_size = HDFS_BLOCK_SIZE;
              :     // Blocks are usually HDFS_BLOCK_SIZE bytes, unless there 
are many columns, in
              :     // which case a per-column minimum kicks in.
              :     block_size = max(block_size, MinBlockSize(columns_.size()));
              :   }
              :   // HDFS does not like block sizes that are not aligned
              :   return BitUtil::RoundUp(block_size, HDFS_BLOCK_ALIGNMENT);
              : }
              :
              : int64_t HdfsParquetTableWriter::default_plain_page_size() const 
{
              :   int64_t plain_page_size = 0;
              :   if (table_desc_->IsIcebergTable()) {
              :     plain_page_size = 
table_desc_->IcebergParquetPlainPageSize();
              :   }
              :
              :   if (plain_page_size <= 0) plain_page_size = 
DEFAULT_DATA_PAGE_SIZE;
              :   return plain_page_size;
              : }
              :
              : int64_t HdfsParquetTableWriter::dict_page_size() const {
              :   int64_t dict_page_size = 0;
              :   if (table_desc_->IsIcebergTable()) {
              :     dict_page_size = table_desc_->IcebergParquetDictPageSize();
              :   }
              :
              :   if (dict_page_size <= 0) dict_page_size = 
DEFAULT_DATA_PAGE_SIZE;
              :   return dict_page_size;
              : }
Optional: We could have a data member for each, e.g. default_block_size_, 
dict_page_size_, etc.

And have Configure(), ConfigureForIceberg() methods that would set these values 
accordingly. This way the Iceberg-related code parts could be separated from 
the common code parts. Also, we could easily check the values during debugging.


http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@96
PS4, Line 96: coded
codec


http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17654/4/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@191
PS4, Line 191:     if (IcebergTable.PARQUET_COMPRESSION_CODEC.equals(propKey)) 
return true;
             :     if (IcebergTable.PARQUET_COMPRESSION_LEVEL.equals(propKey)) 
return true;
             :     if (IcebergTable.PARQUET_ROW_GROUP_SIZE.equals(propKey)) 
return true;
             :     if (IcebergTable.PARQUET_PLAIN_PAGE_SIZE.equals(propKey)) 
return true;
             :     if (IcebergTable.PARQUET_DICT_PAGE_SIZE.equals(propKey)) 
return true;
I think these properties should be stored at the Iceberg table level as well.



--
To view, visit http://gerrit.cloudera.org:8080/17654
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Gerrit-Change-Number: 17654
Gerrit-PatchSet: 4
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Wed, 14 Jul 2021 12:50:51 +0000
Gerrit-HasComments: Yes

Reply via email to