Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 )
Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table ...................................................................... Patch Set 13: Hi WangSheng, Yes, it's what I meant, I only have minor corrections, answered inline: > 1. We prohibit explicitly setting the table LOCATION when using > 'hadoop.catalog' Yes, I think we should prohibit it and let Iceberg decide the location. > 2. When execute 'SHOW CREATE TABLE', table location is > 'hdfs://test-warehouse/ice-catalog/default/ice_t' If we prohibit setting table location, then we can't output it in SHOW CREATE TABLE. > 3. When execute 'DESCRIBE FORMATTED', table location is > 'hdfs://test-warehouse/ice-catalog/default/ice_t', and > 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog' Yes. > 4. When execute 'DROP TABLE', we only delete > 'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve > 'hdfs://test-warehouse/ice-catalog' Yes. > 5. 'iceberg.catalog_location' is necessary for both managed and external > 'hadoop.catalog' Iceberg table Yes. Maybe later we could also have a default location somewhere under the warehouse directory. But for now I think explicit catalog locations are better. > 6. When loading managed table, we use > load('iceberg.catalog_location',TableIdentifier(db.tbl)) Yes. > 7. When loading external table, we use > load('iceberg.catalog_location',TableIdentifier('iceberg.table_name')) Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'? And the value of it should be a dot-separated string of names that was originally used to create the Iceberg table. Iceberg can already make table identifiers from dot-separated strings: https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47 > > Does my understanding correct? > > And still some questions: > > 1. When creating external 'hadoop.catalog' Iceberg table, are we load table > by 'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two > properties are necessary. So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier', just to use Iceberg terminology and semantics. Yeah, probably we want it to be a required property for EXTERNAL hadoop catalog tables. I just wonder if it would make sense to infer it from 'db_name.tbl_name'. But let's start with it as a required property, then later we'll se if inferring it makes sense and add support for it in a future patch. > 2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not, > how do we define table LOCATION? Using > 'iceberg.catalog_location'+/db/external_tbl? Or use > 'iceberg.catalog_location'+'iceberg.table_name'? I think we should not set LOCATION. Iceberg uses catalog location + table identifier to find and load tables, so I think we should provide those: 'iceberg.catalog_location'+'iceberg.table_identifier' And if Iceberg can load the table with the given information, we can set the actual table location. If the table is not found, we should raise an error. -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng <sky...@163.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com> Gerrit-Comment-Date: Fri, 25 Sep 2020 16:30:17 +0000 Gerrit-HasComments: No