Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15797 )

Change subject: IMPALA-9688: Support create iceberg table by impala
......................................................................


Patch Set 22:

(11 comments)

Thanks for bearing with me. I left some comments below. The patch is good to go 
from my side if you can address those in this patch (or create follow up JIRAs 
for addressing them).

http://gerrit.cloudera.org:8080/#/c/15797/21//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15797/21//COMMIT_MSG@11
PS21, Line 11: create table
> Yes, iceberg supported some dml operation as well. I'm working on IMPALA-97
thanks for the clarification.


http://gerrit.cloudera.org:8080/#/c/15797/21//COMMIT_MSG@18
PS21, Line 18:     partition by spec(
             :         level identity,
             :         event_time identity,
             :         event_time hour,
             :         register_time day
             :     )
> 1. I refer to the syntax of kudu table DDl: partition by hash/range, and I
Thanks. I noticed that Presto has some support for Iceberg as per 
https://prestodb.io/docs/current/release/release-0.123.html but I am not sure 
how different or similar it is with this.


http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/cup/sql-parser.cup@1607
PS21, Line 1607: createIcebergPartitionedLayout
What happens when an invalid transform is provided in the input? This method 
throws a TableLoadingException which is a bit weird to see during the Parsing 
stage. In case of Kudu we don't have that problem since the KW_HASH and 
KW_RANGE have specific keywords defined.


http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@256
PS21, Line 256:     // schema. Likewise for external Kudu tables, the schema 
can be read from Kudu.
              :     if (getColumnDefs().isEmpty() && getFileFormat() != 
THdfsFileFormat.AVRO
              :         && getFileFormat() != THdfsFileFormat.KUDU) {
              :
Can Iceberg tables have 0 columns? If not, may be add a check here too for 
Iceberg.


http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@143
PS22, Line 143: getIcebergPartitionSpecs
Is it possible that the user creates a iceberg table without providing a 
partition by spec clause? What is the behavior in this case for table loading 
code?

I think it would great if you could add some positive and negative tests 
similar to AnalyzeKuduDDLTest (I think its okay to do it as a follow up in 
separate JIRA).


http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@268
PS22, Line 268:     }
              :
              :     if (getFileFormat() == THdfsFileFormat.ICEBERG) {
              :       analyzeIcebergFormat();
              :     }
If you look at how Kudu table analysis is implemented, 
tableDef_.analyze(analyzer) method in line 253 makes sure that the if there are 
any Kudu options used, the table is indeed a Kudu table. So may be we should do 
it here as well for Iceberg specific options.

Also, I don't see the IcebergPartitionSpec itself getting analyzed anywhere. 
For example, if there are any column types which are not supported we should 
throw analysis exception if user tries to provide such columns. Also, we should 
make sure that the partition spec column names exist in the table column names 
if that is disallowed. I think it will be cleaner if we call 
IcebergPartitionSpec.analyze() method here so that we can plugin any checks in 
IcebergPartitionField during the createTableStmt analysis phase. So may be add 
something like for the sake of completeness.

for (IcebergPartitionSpec partitionSpec : getIceBergPartitionSpecs) {
  partitionSpec.analyze();
}


http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@579
PS22, Line 579: putGeneratedKuduProperty
Can you please rename this to putGeneratedProperty since this method is shared 
by Kudu and Iceberg now?


http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java:

http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28
PS22, Line 28:   //Column source id from PartitionField
             :   private int sourceId_;
             :
             :   //Column id from iceberg PartitionField
             :   private int fieldId_;
I think it would be great if we add more documentation about these fields and 
how they are used by Iceberg. Specifically, I see that during the parsing 
stage, we always create IcebergPartitionField with the sourceId and fieldId as 
0,0.

Do these ids ever change?


http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java:

http://gerrit.cloudera.org:8080/#/c/15797/22/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java@24
PS22, Line 24: PARTITION BY and PARTITIONED BY
Can you update this javadoc to include that this is used by Iceberg's partition 
by spec also?


http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@231
PS21, Line 231:       for (PartitionField field : spec.fields()) {
              :         fields.add(new IcebergPartitionField(field.sourceId(), 
field.fieldId(),
              :             field.name(), 
IcebergUtil.getPartitionTransform(field)));
              :       }
Do we need to make sure that the field id or name exists in the table column 
spec? For instance if a table is created as

create table iceberg_test(
        level string,
        event_time timestamp,
    )
    partition by spec(
        level identity,
        event_time hour,
    )
    stored as iceberg;

Can the specs here include a name which is not in (level, event_time)?


http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/15797/21/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2547
PS21, Line 2547: msClient.getHiveClient().createTable(newTable);
> Yes, you are right, iceberg table is always unpartitioned in hms. I don't u
What I meant was to add a comment here saying that "Iceberg tables are always 
unpartitioned. The partition columns are derived from the partition spec is 
derived from The TCreateTableParams field and could include one or more of the 
table columns". Also, may be make sure to add 
Preconditions.checkState(newTable.getPartitionKeys() == null || 
newTable.getPartitionKeys().isEmpty()) as a sanity check.



--
To view, visit http://gerrit.cloudera.org:8080/15797
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d85db4c904a8c758c4cfb4f19cfbdab7e6ea284
Gerrit-Change-Number: 15797
Gerrit-PatchSet: 22
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Mon, 15 Jun 2020 19:40:02 +0000
Gerrit-HasComments: Yes

Reply via email to