Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16939 )

Change subject: IMPALA-10432: INSERT INTO Iceberg tables with partition 
transforms
......................................................................


Patch Set 5:

(6 comments)

Hey Zoltan,
Great work! I went through the test part of this patch and saw some 
misalignment between the implementation and the Iceberg spec. It might be that 
the spec is outdated or not articulating the expected behaviour well. Just 
wanted to make 100% sure. Left some comments about this in the test code.

http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test:

http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@365
PS5, Line 365: t_year=1969
I checked the Iceberg spec and it says that year transform should return an int 
as the number of years from 1970. Is it expected to have partition values like 
1969, 1970 and 2021 here?


http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@487
PS5, Line 487: t_month=1970-01
Just for my understanding: According to Iceberg spec month transform should 
return an int, number fo months since 1970-01-01. So shouldn't we see an int 
number as a partition value instead of a "year-month" formatted string?
(Haven't checked the behaviour of Iceberg just read the spec, though)


http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@691
PS5, Line 691: create table hour_transform(t timestamp)
Do we have a negative test where HOUR partition transform is used on a date 
field?


http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@723
PS5, Line 723: 1969-12-31-22
Similar comment I made above about Iceberg spec: "as hours from 1970-01-01 
00:00:00" shouldn't these partition values be an int value instead of a 
formatted string?


http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@869
PS5, Line 869: ====
I miss some tests where null values are inserted into partition columns. Do you 
think that would make sense?


http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@870
PS5, Line 870:
Well just throwing out ideas but would it make sense to have a partitioned 
Iceberg table written by another system (e.g. Spark, or Hive if write support 
is implemented) and then make an INTSERT INTO impala_tbl FROM SELECT * FROM 
spark_tbl or something similar. The we can check if Impala created the same 
partitions with the same partition values e.g.



--
To view, visit http://gerrit.cloudera.org:8080/16939
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3edf02048cea78703837b248c55219c22d512b78
Gerrit-Change-Number: 16939
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Mon, 18 Jan 2021 16:50:08 +0000
Gerrit-HasComments: Yes

Reply via email to