[Impala-ASF-CR] IMPALA-10432: INSERT INTO Iceberg tables with partition transforms

Zoltan Borok-Nagy (Code Review) Thu, 14 Jan 2021 08:45:00 -0800

Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16939 )


Change subject: IMPALA-10432: INSERT INTO Iceberg tables with partition 
transforms
......................................................................


Patch Set 3:

(4 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16939/2/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16939/2/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java@198
PS2, Line 198:         if (unixMicros >= 0) {
> This means that if unixMicros<0, then we won't do partition pruning, but qu
Yes, I also have some test queries that have predicates with timestamps before 
the unix epoch.


http://gerrit.cloudera.org:8080/#/c/16939/2/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16939/2/fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java@186
PS2, Line 186:           partSpec.partitionType(),
             :           feIcebergTable.getDefaultPartitionSpec(), 
dataFile.partitionPath()
> nit: +2 indent
Done


http://gerrit.cloudera.org:8080/#/c/16939/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test
File testdata/workloads/functional-planner/queries/PlannerTest/insert.test:

http://gerrit.cloudera.org:8080/#/c/16939/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test@976
PS2, Line 976:        cast(id * 3 as int),
             :        cast(10000 - id as int
> nit: +1 indent?
Done


http://gerrit.cloudera.org:8080/#/c/16939/2/testdata/workloads/functional-planner/queries/PlannerTest/insert.test@984
PS2, Line 984: year
> A potential optimization in timestamp based partitioning expressions could
This optimization could interfere with SORT BY expressions. E.g. if we want to 
SORT BY ZORDER (columnA, columnB), then ordering by timestampCol instead of 
year(timestampCol) would make no room for ZORDERing because the data would be 
majorly sorted by the timestampCol.

If there is no SORT BY, then yes, it could probably make the INSERT more 
efficient, but it would also complicate the code. I'd rather not do that in 
this commit. But we'll definitely need thorough performance analysis once the 
main features are completed.



--
To view, visit http://gerrit.cloudera.org:8080/16939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3edf02048cea78703837b248c55219c22d512b78
Gerrit-Change-Number: 16939
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Thu, 14 Jan 2021 16:44:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10432: INSERT INTO Iceberg tables with partition transforms

Reply via email to