Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 )
Change subject: IMPALA-9741: Supported query Icebreg table by impala ...................................................................... Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7 PS6, Line 7: Icebreg It's still misspelled http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7 PS6, Line 7: Supported query nit: Support querying http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@26 PS6, Line 26: identity specify http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512 PS6, Line 512: source_cols_map nit: column_to_source_id ? http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515 PS6, Line 515: partition_col_to_source_id_map The mapping is reversed. Name it "source_id_to_partition" ? http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516 PS6, Line 516: map<string,THdfsFileDesc> file_descriptors Please follow the above conventions for naming maps. http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28 PS6, Line 28: // The id of the source field in iceberg table Schema, you can get these source : // fields by Schema.columns(), the return type is List<NestedField>. Might worth rewording it a bit: "The id of the source column in the Iceberg table schema. The source column is used as the input for this partition field." http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88 PS6, Line 88: if (table_ instanceof FeIcebergTable) { : if (((FeIcebergTable) table_).getPartitionColToSourceIdMap().isEmpty()) { : notPartitioned = true; : } Probably we should treat all Iceberg tables as not partitioned, since it's partitioning is different than other file system tables' partitioning. http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66 PS6, Line 66: getFileDescMap nit: getPartitionToFileDescMap http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219 PS6, Line 219: isPartitionTable nit: isPartitioned? http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258 PS6, Line 258: PartitionColToSourceId It returns a mapping from source ids to partition columns, therefore please name it "sourceIdToPartitionCol". http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271 PS6, Line 271: getSourceColsMap nit: getColumnToSourceIdMap? http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305 PS6, Line 305: nit: wrong indentation http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114 PS6, Line 114: if (format == null) return null; : format = format.toUpperCase(); : if (format.equals("PARQUET")) { : return TIcebergFileFormat.PARQUET; : } : return null; How about: if ("PARQUET".equalsIgnoreCase(format)) return TIcebergFileFormat.PARQUET; return null; http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193 PS6, Line 193: 'iceberg': 'ICEBERG' You probably don't need to modify this file. I think adding HUDIPARQUET to this file was also unnecessary. Probably we can do the same thing that we did for Hudi, i.e. add the Iceberg tables under the functional_parquet database. https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/schema_constraints.csv https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/functional_schema_template.sql After that you should be able to load the Iceberg tables via Impala's data loading facilities (e.g. bin/load-data.py), and write some tests, like: https://gerrit.cloudera.org/c/14711/25/tests/query_test/test_scanners.py https://gerrit.cloudera.org/c/14711/25/testdata/workloads/functional-query/queries/QueryTest/hudi-parquet.test -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 6 Gerrit-Owner: wangsheng <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Wed, 15 Jul 2020 15:01:24 +0000 Gerrit-HasComments: Yes
