Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16545 )
Change subject: IMPALA-10215: Implement INSERT INTO for non-partitioned Iceberg tables (Parquet) ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/runtime/coordinator.cc@786 PS3, Line 786: is_transactional is_transactional became a bit misleading IMO, as Iceberg is also kind of transactional - it has an API to do multiple operations in a transaction (https://iceberg.apache.org/api/#transactions), and while we don't use it, we also don't support multiple statements in a Hive ACID transactions. We could rename it to is_hive_acid, or probably it would the best to have an enum with non_transactional/hive_acid_insert_only/hive_acid_full/iceberg values. We use transactional as a synonym of Hive ACID in a lot of places, I don't think that we should replace them in this patch, but coming up with good names would be nice. http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/runtime/coordinator.cc@800 PS3, Line 800: RETURN_IF_ERROR(DeleteQueryLevelStagingDir()); Won't we incorrectly call this in case of Iceberg tables? http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/16545/3/be/src/service/client-request-state.cc@1213 PS3, Line 1213: if (per_part_map.size() != 1) return ret; If this is always the case than wouldn't be a DCHECK better? http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java: http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopCatalog.java@96 PS3, Line 96: String.format("Failed to load Iceberg table with id: %s", tableId)); nit: +2 indentation http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java: http://gerrit.cloudera.org:8080/#/c/16545/3/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHadoopTables.java@101 PS3, Line 101: String.format("Failed to load Iceberg table at location: %s", tableLocation)); nit: +2 indentation http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py File tests/stress/test_insert_stress.py: http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py@50 PS3, Line 50: while num_inserts < 30: Isn't it a problem that this affect ACID tests too? There could be a parameter like 'num_writes' to set this. http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py@114 PS3, Line 114: self.client.execute("drop table if exists %s" % tbl_name) Is this really needed? We should never use the same unique_database twice. http://gerrit.cloudera.org:8080/#/c/16545/3/tests/stress/test_insert_stress.py@124 PS3, Line 124: writers = [Task(self._impala_role_concurrent_writer, tbl_name, i, counter) Yeah, it's good to see that the infrastructure for ACID inserts can be reused so easily :) -- To view, visit http://gerrit.cloudera.org:8080/16545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5690fb6c2cc51f0033fa26caf8597c80a11bcd8e Gerrit-Change-Number: 16545 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: wangsheng <[email protected]> Gerrit-Comment-Date: Mon, 19 Oct 2020 15:03:25 +0000 Gerrit-HasComments: Yes
