Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16939 )
Change subject: IMPALA-10432: INSERT INTO Iceberg tables with partition transforms ...................................................................... Patch Set 5: (6 comments) Hey Zoltan, Great work! I went through the test part of this patch and saw some misalignment between the implementation and the Iceberg spec. It might be that the spec is outdated or not articulating the expected behaviour well. Just wanted to make 100% sure. Left some comments about this in the test code. http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test: http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@365 PS5, Line 365: t_year=1969 I checked the Iceberg spec and it says that year transform should return an int as the number of years from 1970. Is it expected to have partition values like 1969, 1970 and 2021 here? http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@487 PS5, Line 487: t_month=1970-01 Just for my understanding: According to Iceberg spec month transform should return an int, number fo months since 1970-01-01. So shouldn't we see an int number as a partition value instead of a "year-month" formatted string? (Haven't checked the behaviour of Iceberg just read the spec, though) http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@691 PS5, Line 691: create table hour_transform(t timestamp) Do we have a negative test where HOUR partition transform is used on a date field? http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@723 PS5, Line 723: 1969-12-31-22 Similar comment I made above about Iceberg spec: "as hours from 1970-01-01 00:00:00" shouldn't these partition values be an int value instead of a formatted string? http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@869 PS5, Line 869: ==== I miss some tests where null values are inserted into partition columns. Do you think that would make sense? http://gerrit.cloudera.org:8080/#/c/16939/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-partition-transform-insert.test@870 PS5, Line 870: Well just throwing out ideas but would it make sense to have a partitioned Iceberg table written by another system (e.g. Spark, or Hive if write support is implemented) and then make an INTSERT INTO impala_tbl FROM SELECT * FROM spark_tbl or something similar. The we can check if Impala created the same partitions with the same partition values e.g. -- To view, visit http://gerrit.cloudera.org:8080/16939 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3edf02048cea78703837b248c55219c22d512b78 Gerrit-Change-Number: 16939 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Mon, 18 Jan 2021 16:50:08 +0000 Gerrit-HasComments: Yes
