Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16446 )
Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table ...................................................................... Patch Set 13: Hi WangSheng, thank you for your reply. I think we should choose the option that causes the least confusions. With the current solution the users can create two tables in the following way: CREATE TABLE ice_1 (i int) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/ice-catalog'; CREATE TABLE ice_2 (s string) STORED AS ICEBERG LOCATION 'hdfs://test-warehouse/ice-catalog'; ice_1 and ice_2 will be created in the same hadoop catalog, but they can contain their own data under their own (implicit) table location. I think it's a valid use case as iceberg supports creating multiple tables in the same catalog. Moreover, it's probably how Iceberg catalogs are meant to be used. Now if the user DROPs one of them, HMS will remove the whole catalog, possibly causing an unintended data loss. Note that this case is different than having two managed PARQUET tables based on same location, because in that case the tables point to the same data. Also I cannot think of a use case when users should do that. I propose the following: we should introduce a new table property: 'iceberg.catalog_location'. So users would create tables with the following statement if they want to use hadoop catalog: CREATE TABLE ice_t (i int) STORED AS ICEBERG TBLPROPERTIES('iceberg.catalog'='hadoop_catalog', 'iceberg.catalog_location'='hdfs://test-warehouse/ice-catalog'); In that case it would be quite explicit what's happening. And we'd set the table's location to what Iceberg computes for the table (from <Catalog location> + <table identifier>). We probably even want to prohibit explicitly setting the table LOCATION (so SHOW CREATE TABLE shouldn't include it either) when using hadoop catalog. So DROP TABLE wouldn't affect other tables, and DESCRIBE FORMATTED would automatically show the actual table LOCATION and 'iceberg.catalog_location' (since it's a table property). If we DROP all the tables in an iceberg catalog, then the empty catalog directory will still remain, but I don't see that as a serious issue. Tables created via HadoopTables are not affected, i.e. they continue to work like they already work. What do you think about this approach? -- To view, visit http://gerrit.cloudera.org:8080/16446 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef Gerrit-Change-Number: 16446 Gerrit-PatchSet: 13 Gerrit-Owner: wangsheng <sky...@163.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: wangsheng <sky...@163.com> Gerrit-Comment-Date: Fri, 25 Sep 2020 10:04:16 +0000 Gerrit-HasComments: No