Zoltan Borok-Nagy has posted comments on this change. (
http://gerrit.cloudera.org:8080/16446 )
Change subject: IMPALA-10164: Supporting HadoopCatalog for Iceberg table
......................................................................
Patch Set 13:
Hi WangSheng,
Yes, it's what I meant, I only have minor corrections, answered inline:
> 1. We prohibit explicitly setting the table LOCATION when using
> 'hadoop.catalog'
Yes, I think we should prohibit it and let Iceberg decide the location.
> 2. When execute 'SHOW CREATE TABLE', table location is
> 'hdfs://test-warehouse/ice-catalog/default/ice_t'
If we prohibit setting table location, then we can't output it in SHOW CREATE
TABLE.
> 3. When execute 'DESCRIBE FORMATTED', table location is
> 'hdfs://test-warehouse/ice-catalog/default/ice_t', and
> 'iceberg.catalog_location' is 'hdfs://test-warehouse/ice-catalog'
Yes.
> 4. When execute 'DROP TABLE', we only delete
> 'hdfs://test-warehouse/ice-catalog/default/ice_t', and reserve
> 'hdfs://test-warehouse/ice-catalog'
Yes.
> 5. 'iceberg.catalog_location' is necessary for both managed and external
> 'hadoop.catalog' Iceberg table
Yes. Maybe later we could also have a default location somewhere under the
warehouse directory. But for now I think explicit catalog locations are better.
> 6. When loading managed table, we use
> load('iceberg.catalog_location',TableIdentifier(db.tbl))
Yes.
> 7. When loading external table, we use
> load('iceberg.catalog_location',TableIdentifier('iceberg.table_name'))
Yes, maybe rename 'iceberg.table_name' to 'iceberg.table_identifier'? And the
value of it should be a dot-separated string of names that was originally used
to create the Iceberg table. Iceberg can already make table identifiers from
dot-separated strings:
https://github.com/apache/iceberg/blob/1a797cd986c9193f2e422fec295aaf25ce7e1916/api/src/main/java/org/apache/iceberg/catalog/TableIdentifier.java#L47
>
> Does my understanding correct?
>
> And still some questions:
>
> 1. When creating external 'hadoop.catalog' Iceberg table, are we load table
> by 'iceberg.catalog_location' and 'iceberg.table_name'? If so, both these two
> properties are necessary.
So instead of 'iceberg.table_name' we could have 'iceberg.table_identifier',
just to use Iceberg terminology and semantics. Yeah, probably we want it to be
a required property for EXTERNAL hadoop catalog tables. I just wonder if it
would make sense to infer it from 'db_name.tbl_name'. But let's start with it
as a required property, then later we'll se if inferring it makes sense and add
support for it in a future patch.
> 2. Can we set the table LOCATION for 'hadoop.catalog' external table? If not,
> how do we define table LOCATION? Using
> 'iceberg.catalog_location'+/db/external_tbl? Or use
> 'iceberg.catalog_location'+'iceberg.table_name'?
I think we should not set LOCATION. Iceberg uses catalog location + table
identifier to find and load tables, so I think we should provide those:
'iceberg.catalog_location'+'iceberg.table_identifier'
And if Iceberg can load the table with the given information, we can set the
actual table location. If the table is not found, we should raise an error.
--
To view, visit http://gerrit.cloudera.org:8080/16446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Gerrit-Change-Number: 16446
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Fri, 25 Sep 2020 16:30:17 +0000
Gerrit-HasComments: No