Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Support querying Iceberg table by impala
......................................................................


Patch Set 18:

(10 comments)

One thing I didn't understand about the patch is the support for partition 
pruning. I don't see enough code here to handle all the edge cases - partition 
evolution, different partition transforms, etc.

I'm OK with deferring optimizations to a later patch, but want to make sure 
that this patch is correct and that we document what the limitations are.

http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@611
PS18, Line 611:     // Iceberg table only has one partition spec now
What does this mean? We don't support partition evolution yet?


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@615
PS18, Line 615: filedName
nit: fieldName


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1889
PS18, Line 1889: public
Can you make this protected instead of public, since it's only being used 
within this package.


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@372
PS18, Line 372: public
I think this can be protected


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@61
PS18, Line 61: icebergConjunts_
nit: icebergConjuncts_


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@94
PS18, Line 94: Icber
nit: Iceberg


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@123
PS18, Line 123:         it.remove();
I think this predicate pushdown needs a bit more explanation.

As far as I can see there are three possibilities for predicate pushdown:
1. The column is not part of any Iceberg partition expression so cannot be 
pushed down at all.
2. The column is part of all partition keys without any transformation (i.e. 
the identity transformation). https://iceberg.apache.org/spec/#partitioning - 
the predicate can be pushed down and does not need to be removed at all.
3. The column is part of some or all partition keys, or is transformed in some 
way during partitioning - we can push down the predicate to get partition 
pruning, but also need to evaluate it in the scan. I guess the two cases here 
are that there has been some partition evolution over time, and that there is 
some transform of the partition column.

It's not clear how the code is handling #3. I think this needs to be clarified.

The strict vs inclusive projection in the spec might be useful for explaining 
it - https://iceberg.apache.org/spec/#scan-planning


http://gerrit.cloudera.org:8080/#/c/16143/18/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@226
PS18, Line 226:    * Returns true if this column does not belong to iceberg 
partition column.
What does this mean in terms of the iceberg spec - 
https://iceberg.apache.org/spec/#partitioning. Does this mean that it only 
returns true if the column is part of the partition with an identity transform?

What about if partition evolution has happened and it is a partition column for 
some but not all columns?


http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/data/README@508
PS18, Line 508: <<<<<<< HEAD
nit: remove this merge marker


http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test:

http://gerrit.cloudera.org:8080/#/c/16143/18/testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test@3
PS18, Line 3: Icebreg
nit: Iceberg



--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 18
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Anonymous Coward (606)
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Mon, 10 Aug 2020 17:53:31 +0000
Gerrit-HasComments: Yes

Reply via email to