[Impala-ASF-CR] IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

Csaba Ringhofer (Code Review) Mon, 19 Oct 2020 08:05:02 -0700

Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16545 )


Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg 
tables (Parquet)
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/runtime/coordinator.cc@786
PS3, Line 786: is_transactional
is_transactional became a bit misleading IMO, as Iceberg is also kind of 
transactional - it has an API to do multiple operations in a transaction 
(https://iceberg.apache.org/api/#transactions), and while we don't use it, we 
also don't support multiple statements in a Hive ACID transactions.

We could rename it to is_hive_acid, or probably it would the best to have an 
enum with non_transactional/hive_acid_insert_only/hive_acid_full/iceberg values.

We use transactional as a synonym of Hive ACID in a lot of places, I don't 
think that we should replace them in this patch, but coming up with good names 
would be nice.


http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/runtime/coordinator.cc@800
PS3, Line 800:     RETURN_IF_ERROR(DeleteQueryLevelStagingDir());
Won't we incorrectly call this in case of Iceberg tables?


http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/service/client-request-state.cc@1213
PS3, Line 1213:   if (per_part_map.size() != 1) return ret;
If this is always the case than wouldn't be a DCHECK better?


http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java@96
PS3, Line 96:       String.format("Failed to load Iceberg table with id: %s", 
tableId));
nit: +2 indentation


http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java
File 
fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java:

http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java@101
PS3, Line 101:       String.format("Failed to load Iceberg table at location: 
%s", tableLocation));
nit: +2 indentation


http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py
File tests/stress/test_insert_stress.py:

http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py@50
PS3, Line 50:       while num_inserts < 30:
Isn't it a problem that this affect ACID tests too? There could be a parameter 
like 'num_writes' to set this.


http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py@114
PS3, Line 114:     self.client.execute("drop table if exists %s" % tbl_name)
Is this really needed? We should never use the same unique_database twice.


http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py@124
PS3, Line 124:     writers = [Task(self._impala_role_concurrent_writer, 
tbl_name, i, counter)
Yeah, it's good to see that the infrastructure for ACID inserts can be reused 
so easily :)



--
To view, visit http://gerrit.cloudera.org:8080/16545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e
Gerrit-Change-Number: 16545
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Mon, 19 Oct 2020 15:03:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet)

Reply via email to