wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
......................................................................


Patch Set 13:

> (2 comments)

Hi Zoltan, I've already read your reply closely, it seems we have some 
different understanding, here is some of my opinions for this patch:
1. We use location in SQL as table root path, like 
'/test-warehouse/iceberg_test/hadoop_catalog/hadoop_catalog_test', regardless 
of the structure under this location. If we use 'hadoop.catalog', the structure 
like this:
        /test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/metadata/xxx
        /test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table/data/xxx
And if we use 'hadoop.tables', the structure like this:
        /test-warehouse/iceberg_test/hadoop_catalog/metadata/xxx
        /test-warehouse/iceberg_test/hadoop_catalog/data/xxx
In this situation, whether creating managed or external Iceberg table based on 
'hadoop.catalog'/'hadoop.tables', we just need to provide a location 
'/test-warehouse/iceberg_test/hadoop_catalog'. Even if you don't provide a 
location in SQL when creating managed Iceberg table, we will also use 
'$DEFAULT_WAREHOUSE/my_table' as table root path. I think this keep the consist 
of 'hadoop.catalog' and 'hadoop.tables'. So we just need to remember a root 
table path.

2. Based on above situation, when creating two managed table on same location 
based on 'hadoop.catalog', drop one of the table, the location will be deleted 
by HMS. And I think this keep the consist of HdfsTable and IcebergTable. For 
example, when creating two managed PARQUET tables based on same location, drop 
one of the table, the whole location will also be deleted by HMS.

Based on above opinions, here is some of my questions:

1. If using HadoopCatalog.dropTable in code, the root path 
'test-warehouse/iceberg_test/hadoop_catalog' will be reserved, just deleted 
'/my_db/my_table', this is different from HadoopTables or normal HdfsTable, 
which DROP TABLE will delete whole location, does this make users feel confused?

2. DESCRIBE FORMATTED shows the actual table location 
'/test-warehouse/iceberg_test/hadoop_catalog/my_db/my_table', but SHOW CREATE 
TABLE shows sql location '/test-warehouse/iceberg_test/hadoop_catalog', I‘m not 
sure if this is appropriate for two queries return different location on same 
table. But same location for 'hadoop.tables'.

If you think the above two modifications are indeed better, I will adjust code 
in current patch as soon as possible.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Fri, 25 Sep 2020 02:39:18 +0000
Gerrit-HasComments: No

Reply via email to