Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16446 )

Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
......................................................................


Patch Set 13:

Hi WangSheng,

Yes, it's what I meant, I only have minor corrections, answered inline:

> 1. We prohibit explicitly setting the table LOCATION when using 
> 'hadoop.catalog'
Yes, I think we should prohibit it and let Iceberg decide the location.

> 2. When execute 'SHOW CREATE TABLE', table location is 
> 'hdfs://test-warehouse/ice-catalog/default/ice_t'
If we prohibit setting table location, then we can't output it in SHOW CREATE 
TABLE.

> 3. When execute 'DESCRIBE FORMATTED', table location is 
> 'hdfs://test-warehouse/ice-catalog/default/ice_t', and 
> 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog'
Yes.

> 4. When execute 'DROP TABLE', we only delete 
> 'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve 
> 'hdfs://test-warehouse/ice-catalog'
Yes.

> 5. 'iceberg.catalog_location' is necessary for both managed and external 
> 'hadoop.catalog' Iceberg table
Yes. Maybe later we could also have a default location somewhere under the 
warehouse directory. But for now I think explicit catalog locations are better.

> 6. When loading managed table, we use 
> load('iceberg.catalog_location',TableIdentifier(db.tbl))
Yes.

> 7. When loading external table, we use 
> load('iceberg.catalog_location',TableIdentifier('iceberg.table_name'))
Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'? And the 
value of it should be a dot-separated string of names that was originally used 
to create the Iceberg table. Iceberg can already make table identifiers from 
dot-separated strings:
https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47

>
> Does my understanding correct?
>
> And still some questions:
>
> 1. When creating external 'hadoop.catalog' Iceberg table, are we load table 
> by 'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two 
> properties are necessary.
So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier', 
just to use Iceberg terminology and semantics. Yeah, probably we want it to be 
a required property for EXTERNAL hadoop catalog tables. I just wonder if it 
would make sense to infer it from 'db_name.tbl_name'. But let's start with it 
as a required property, then later we'll se if inferring it makes sense and add 
support for it in a future patch.

> 2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not, 
> how do we define table LOCATION? Using 
> 'iceberg.catalog_location'+/db/external_tbl? Or use 
> 'iceberg.catalog_location'+'iceberg.table_name'?
I think we should not set LOCATION. Iceberg uses catalog location + table 
identifier to find and load tables, so I think we should provide those:
'iceberg.catalog_location'+'iceberg.table_identifier'
And if Iceberg can load the table with the given information, we can set the 
actual table location. If the table is not found, we should raise an error.


--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng <sky...@163.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: wangsheng <sky...@163.com>
Gerrit-Comment-Date: Fri, 25 Sep 2020 16:30:17 +0000
Gerrit-HasComments: No

Reply via email to