[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Reviewed-on: http://gerrit.cloudera.org:8080/21149
Reviewed-by: Daniel Becker 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
18 files changed, 388 insertions(+), 119 deletions(-)

Approvals:
  Daniel Becker: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 12
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 13:57:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10450/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 11:05:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 11: Code-Review+2

Thanks Gábor!


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 10:50:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15707/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 09:13:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#10).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
18 files changed, 388 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/10
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10446/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 08:52:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 9:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/15706/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 08:47:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 9:

After a rebase with master some Iceberg metadata table scanning tests broke. 
Apparently, this patch collides with the Array col querying from metadata 
tables resulting a Precondition check fail in TupleDescriptor.getNumNullBits().
The reason apparently, is that a struct is expected to be nullable. However, 
now with this change when in SlotDescriptor.setPath() I update the nullability 
flag, for metadata tables all the IcebergColumns will be not nullable. The 
IcebergColumns nullability comes from the Iceberg Schema for metadata tables 
from the Iceberg lib, and apparently e.g. partitions.partition is a struct but 
also not nullable and this gets caught byt the mentioned precondition. As a 
workaround I set all the metadata table cols nullable. Since we don't write 
these this causes no harm.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 08:34:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#9).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergMetadataTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M 
testdata/workloads/functional-query/queries/QueryTest/iceberg-metadata-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
18 files changed, 386 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/9
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21149/9/fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
File fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java:

http://gerrit.cloudera.org:8080/#/c/21149/9/fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java@51
PS9, Line 51:   public static IcebergColumn cloneWithNullability(IcebergColumn 
source, boolean isNullable) {
line too long (94 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 9
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 28 Mar 2024 08:28:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10440/


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 21:29:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15694/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 16:25:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#8).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 315 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/8
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:40:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10440/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:40:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15692/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:33:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 6: Code-Review+2

(1 comment)

Thanks Gábor!

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@915
PS3, Line 915:
> Well, Iceberg by design only throws RuntimeExceptions, and at the call site
I agree we should not make decisions based on the error messages.



--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:17:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> Thanks. Maybe in the future Iceberg options will not be a subset of Kudu op
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@915
PS3, Line 915:
> Ok, it it's difficult we can leave it as it is.
Well, Iceberg by design only throws RuntimeExceptions, and at the call sites, 
it's not that easy to identify what the issue was. I mean the error messages 
are good, but I don't want to make decisions based on the error message


http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@890
PS5, Line 890: doubl
> "double" would be better in the name.
Done



--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 15:10:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#6).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 315 insertions(+), 55 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/6
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-27 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 5: Code-Review+1

(3 comments)

Only a few nits, otherqise LGTM.

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> no. This is the case when the table is not Iceberg, not Kudu but still some
Thanks. Maybe in the future Iceberg options will not be a subset of Kudu 
options so also checking those could still make sense in order to be more 
future proof.


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@915
PS3, Line 915:
> this error msg in fact comes from the Iceberg library. I found it verbose e
Ok, it it's difficult we can leave it as it is.


http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@890
PS5, Line 890: float
"double" would be better in the name.



--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 27 Mar 2024 13:02:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15655/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 25 Mar 2024 17:28:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15652/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 25 Mar 2024 17:22:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java@181
PS3, Line 181:  if the column has options th
> What are these incompatible with? Could you clarify the name or add a comme
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180
PS3, Line 180:
> Could this be an ELSE IF?
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180
PS3, Line 180:   isNullable_ = icebergColumn.isNullable();
> We could add a comment like on L174, or extend that one.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@482
PS3, Line 482:
> Nit: 'tables have' would be better, now that we modify this comment.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@490
PS3, Line 490: if (
> Nit: ELSE IF should go on the previous line.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> Why don't we need a similar clause for Iceberg? A comment could clarify.
added a comment


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (isIcebergTable()) {
> Shouldn't we include this condition and return false in the "isIcebergTable
no. This is the case when the table is not Iceberg, not Kudu but still some 
Kudu column options are provided. The reason we don't have to check for Iceberg 
options here is that Iceberg options are subset of Kudu options.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@572
PS3, Line 572: } else if (!primaryKeyColNames_.isEmpty() && 
!isPrimaryKeyUnique()
> If non-unique primary keys are now supported in Iceberg, shouldn't we modif
L507 is a bit different. There are 2 ways to provide primary keys:
1: CREATE TABLE (i int not null primary key, j int);
2: CREATE TABLE (i int not null, j int, primary key(i));

With this patch we only support 2). That is more flexible btw, because you can 
add multiple cols into the PK, while with 1) you can't.

Changed the comment at L89.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@577
PS3, Line 577:
> Are not enforced and non unique PKs the same?
isPrimaryKeyUnique_ stores the info if NOT UNIQUE was provided or not. Added a 
comment about this at L89.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
File fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@174
PS3, Line 174: given columns
> Could mention 'primaryKeyColumnNames'.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@192
PS3, Line 192:
> Nit: Field.
Done


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2446
PS3, Line 2446: not enforced
> Is this a breaking change if this test used to work without "not enforced"?
well, previously Iceberg tables accepted primary key but it was simply ignored. 
I don't think this qualifies for a breaking change but let me know if you feel 
otherwise. Anyway this mimics the SQL syntax in Flink.


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@619
PS3, Line 619:  QUERY
> Could you add comments for the cases which don't have them? It would be inf
Done



[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#5).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 313 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/5
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Gabor Kaszab (Code Review)
Hello Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#4).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 311 insertions(+), 56 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/4
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-25 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 3:

(15 comments)

Thanks for this change, Gábor!

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/ColumnDef.java@181
PS3, Line 181: hasIncompatibleIcebergOptions
What are these incompatible with? Could you clarify the name or add a comment?


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180
PS3, Line 180: if (path_.destColumn() instanceof IcebergColumn) {
We could add a comment like on L174, or extend that one.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java@180
PS3, Line 180: if
Could this be an ELSE IF?


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@482
PS3, Line 482: table has
Nit: 'tables have' would be better, now that we modify this comment.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@490
PS3, Line 490: else
Nit: ELSE IF should go on the previous line.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (columnDef.hasKuduOptions()
Why don't we need a similar clause for Iceberg? A comment could clarify.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@492
PS3, Line 492: if (columnDef.hasKuduOptions())
Shouldn't we include this condition and return false in the "isIcebergTable()" 
branch? In case someone specifies a Kudu option for an Iceberg table.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@572
PS3, Line 572:   " is only supported for Kudu and Iceberg.");
If non-unique primary keys are now supported in Iceberg, shouldn't we modify 
L507 also, and the comment of 'isPrimaryKeyUnique_' on L89?


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java@577
PS3, Line 577: NOT ENFORCED
Are not enforced and non unique PKs the same?


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
File fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@174
PS3, Line 174: given columns
Could mention 'primaryKeyColumnNames'.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java@192
PS3, Line 192: Filed
Nit: Field.


http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/21149/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@2446
PS3, Line 2446: not enforced
Is this a breaking change if this test used to work without "not enforced"?


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test@619
PS3, Line 619:  QUERY
Could you add comments for the cases which don't have them? It would be 
informative.


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@887
PS3, Line 887: double
Could also add a similar test with double.


http://gerrit.cloudera.org:8080/#/c/21149/3/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@915
PS3, Line 915: required int
I think it could be interpreted as we can't delete 'i' because it's NOT NULL 
("required field"). Can we give  an error message that mentions it is a primary 
key?



--
To view, visit http://gerrit.cloudera.org:8080/21149
To 

[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15605/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 21 Mar 2024 14:20:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-21 Thread Gabor Kaszab (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#3).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/custom_cluster/test_lineage.py
15 files changed, 280 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/3
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15559/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 19 Mar 2024 16:15:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-19 Thread Gabor Kaszab (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21149

to look at the new patch set (#2).

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
14 files changed, 272 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/2
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/15511/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 14 Mar 2024 16:57:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-14 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21149


Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..

IMPALA-12729: Allow creating primary keys for Iceberg tables

There are writer engines that use Iceberg's identifier-field-ids from
the Iceberg schema to identify the columns to be written into the
equality delete files (Flink, NiFi). So far Impala wasn't able to
populate this identifier-field-ids. This patch introduces the support
for not enforced primary keys for Iceberg tables, where the primary key
is going to be used for setting identifier-field-ids during Iceberg
schema creation.

Example syntax:
CREATE TABLE ice_tbl (
  i int NOT NULL,
  j int,
  s string NOT NULL
  primary key(i, s) not enforced)
PARTITIONED BY SPEC (truncate(10, s))
STORED AS ICEBERG;

There are some constraints with primary keys (PK) following the
behavior of Flink:
 - Only NOT NULL columns can be in the PK.
 - PK is not allowed in the column definition level like
   'i int NOT NULL PRIMARY KEY'.
 - If the table is partitioned then the partition columns have to be
   part of the PK.
 - Float and double columns are not allowed for the PK.
 - Not allowed to drop a column that is used as a PK.

Testing:
 - New E2E tests added for different table creation scenarios.
 - Manual test to use Nifi for writing into a table with PK.

Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergCtasTarget.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergSchemaConverter.java
M fe/src/main/jflex/sql-scanner.flex
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
13 files changed, 261 insertions(+), 48 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/21149/1
--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-12729: Allow creating primary keys for Iceberg tables

2024-03-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21149 )

Change subject: IMPALA-12729: Allow creating primary keys for Iceberg tables
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21149/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

http://gerrit.cloudera.org:8080/#/c/21149/1/fe/src/main/java/org/apache/impala/analysis/TableDef.java@581
PS1, Line 581:   throw new AnalysisException("Partition columns have to 
be part of the primary " +
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/21149/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/21149/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3909
PS1, Line 3909: partitionSpec, primaryKeyColumnNames, 
newTable.getOwner(), tableProperties).location();
line too long (103 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/21149
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7bea787acdabd8cb04661f4ddb5c3309af0364a6
Gerrit-Change-Number: 21149
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 14 Mar 2024 16:32:18 +
Gerrit-HasComments: Yes