Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
......................................................................


Patch Set 6:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7
PS6, Line 7: Icebreg
It's still misspelled


http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7
PS6, Line 7: Supported query
nit: Support querying


http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@26
PS6, Line 26: identity
specify


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512
PS6, Line 512: source_cols_map
nit: column_to_source_id ?


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515
PS6, Line 515: partition_col_to_source_id_map
The mapping is reversed. Name it "source_id_to_partition" ?


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516
PS6, Line 516: map<string,THdfsFileDesc> file_descriptors
Please follow the above conventions for naming maps.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28
PS6, Line 28:   // The id of the source field in iceberg table Schema, you can 
get these source
            :   // fields by Schema.columns(), the return type is 
List<NestedField>.
Might worth rewording it a bit:

"The id of the source column in the Iceberg table schema. The source column is 
used as the input for this partition field."


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88
PS6, Line 88:         if (table_ instanceof FeIcebergTable) {
            :           if (((FeIcebergTable) 
table_).getPartitionColToSourceIdMap().isEmpty()) {
            :             notPartitioned = true;
            :           }
Probably we should treat all Iceberg tables as not partitioned, since it's 
partitioning is different than other file system tables' partitioning.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66
PS6, Line 66: getFileDescMap
nit: getPartitionToFileDescMap


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219
PS6, Line 219: isPartitionTable
nit: isPartitioned?


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258
PS6, Line 258: PartitionColToSourceId
It returns a mapping from source ids to partition columns, therefore please 
name it "sourceIdToPartitionCol".


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271
PS6, Line 271: getSourceColsMap
nit: getColumnToSourceIdMap?


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305
PS6, Line 305:
nit: wrong indentation


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114
PS6, Line 114:     if (format == null) return null;
             :     format = format.toUpperCase();
             :     if (format.equals("PARQUET")) {
             :       return TIcebergFileFormat.PARQUET;
             :     }
             :     return null;
How about:

 if ("PARQUET".equalsIgnoreCase(format)) return TIcebergFileFormat.PARQUET;
 return null;


http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193
PS6, Line 193:   'iceberg': 'ICEBERG'
You probably don't need to modify this file. I think adding HUDIPARQUET to this 
file was also unnecessary.

Probably we can do the same thing that we did for Hudi, i.e. add the Iceberg 
tables under the functional_parquet database.

https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/schema_constraints.csv
https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/functional_schema_template.sql

After that you should be able to load the Iceberg tables via Impala's data 
loading facilities (e.g. bin/load-data.py), and write some tests, like:

https://gerrit.cloudera.org/c/14711/25/tests/query_test/test_scanners.py
https://gerrit.cloudera.org/c/14711/25/testdata/workloads/functional-query/queries/QueryTest/hudi-parquet.test



--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Wed, 15 Jul 2020 15:01:24 +0000
Gerrit-HasComments: Yes

Reply via email to