wangsheng has posted comments on this change. (
http://gerrit.cloudera.org:8080/16446 )
Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
......................................................................
Patch Set 13:
> (2 comments)
Hi Zoltan, I've already read your reply closely, it seems we have some
different understanding, here is some of my opinions for this patch:
1. We use location in SQL as table root path, like
'/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test', regardless
of the structure under this location. If we use 'hadoop.catalog', the structure
like this:
/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/metadata/xxx
/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/data/xxx
And if we use 'hadoop.tables', the structure like this:
/test-warehouse/iceberg_test/hadoop_catalog/metadata/xxx
/test-warehouse/iceberg_test/hadoop_catalog/data/xxx
In this situation, whether creating managed or external Iceberg table based on
'hadoop.catalog'/'hadoop.tables', we just need to provide a location
'/test-warehouse/iceberg_test/hadoop_catalog'. Even if you don't provide a
location in SQL when creating managed Iceberg table, we will also use
'$DEFAULT_WAREHOUSE/my_table' as table root path. I think this keep the consist
of 'hadoop.catalog' and 'hadoop.tables'. So we just need to remember a root
table path.
2. Based on above situation, when creating two managed table on same location
based on 'hadoop.catalog', drop one of the table, the location will be deleted
by HMS. And I think this keep the consist of HdfsTable and IcebergTable. For
example, when creating two managed PARQUET tables based on same location, drop
one of the table, the whole location will also be deleted by HMS.
Based on above opinions, here is some of my questions:
1. If using HadoopCatalog.dropTable in code, the root path
'test-warehouse/iceberg_test/hadoop_catalog' will be reserved, just deleted
'/my_db/my_table', this is different from HadoopTables or normal HdfsTable,
which DROP TABLE will delete whole location, does this make users feel confused?
2. DESCRIBE FORMATTED shows the actual table location
'/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table', but SHOW CREATE
TABLE shows sql location '/test-warehouse/iceberg_test/hadoop_catalog', I‘m not
sure if this is appropriate for two queries return different location on same
table. But same location for 'hadoop.tables'.
If you think the above two modifications are indeed better, I will adjust code
in current patch as soon as possible.
--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Fri, 25 Sep 2020 02:38:43 +0000
Gerrit-HasComments: No