[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-10-07 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17209565#comment-17209565
 ] 

ASF subversion and git services commented on IMPALA-10164:
--

Commit 5912c4761701b309183f352b87de4edcd17e7c9d in impala's branch 
refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5912c47 ]

IMPALA-10221: Rename 'iceberg_file_format' to 'iceberg.file_format' as Iceberg 
table property

We provide several new table properties in IMPALA-10164, such as
'iceberg.catalog', in order to keep consist of these properties, we
rename 'iceberg_file_format' to 'iceberg.file_format'. When we creating
Iceberg table, we should use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.file_format'='parquet',
'iceberg.catalog'='hadoop.tables')

Change-Id: I722303fb765aca0f97a79bd6e4504765d355a623
Reviewed-on: http://gerrit.cloudera.org:8080/16550
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 


> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-10-01 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205707#comment-17205707
 ] 

ASF subversion and git services commented on IMPALA-10164:
--

Commit 5b720a4d18cc2f2ade54ab223663521a3822343f in impala's branch 
refs/heads/master from skyyws
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5b720a4 ]

IMPALA-10164: Supporting HadoopCatalog for Iceberg table

This patch mainly realizes creating Iceberg table by HadoopCatalog.
We only supported HadoopTables api before this patch, but now we can
use HadoopCatalog to create Iceberg table. When creating managed table,
we can use SQL like this:
  CREATE TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
  )
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test');
We supported two values ('hadoop.catalog', 'hadoop.tables') for
'iceberg.catalog' now. If you don't specify this property in your SQL,
default catalog type is 'hadoop.catalog'.
As for external Iceberg table, you can use SQL like this:
  CREATE EXTERNAL TABLE default.iceberg_test_external
  STORED AS ICEBERG
  TBLPROPERTIES ('iceberg.catalog'='hadoop.catalog',
'iceberg.catalog_location'='hdfs://test-warehouse/iceberg_test',
'iceberg.table_identifier'='default.iceberg_test');
We cannot set table location for both managed and external Iceberg
table with 'hadoop.catalog', and 'SHOW CREATE TABLE' will not display
table location yet. We need to use 'DESCRIBE FORMATTED/EXTENDED' to
get this location info.
'iceberg.catalog_location' is necessary for 'hadoop.catalog' table,
which used to reserved Iceberg table metadata and data, and we use this
location to load table metadata from Iceberg.
'iceberg.table_identifier' is used for Icebreg TableIdentifier.If this
property not been specified in SQL, Impala will use database and table name
to load Iceberg table, which is 'default.iceberg_test_external' in above SQL.
This property value is splitted by '.', you can alse set this value like this:
'org.my_db.my_tbl'. And this property is valid for both managed and external
table.

Testing:
- Create table tests in functional_schema_template.sql
- Iceberg table create test in test_iceberg.py
- Iceberg table query test in test_scanners.py
- Iceberg table show create table test in test_show_create_table.py

Change-Id: Ic1893c50a633ca22d4bca6726c9937b026f5d5ef
Reviewed-on: http://gerrit.cloudera.org:8080/16446
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-15 Thread Jira


[ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196011#comment-17196011
 ] 

Zoltán Borók-Nagy commented on IMPALA-10164:


Hey [~skyyws], yeah this seems a reasonable syntax, also more convenient for 
the users. And probably it's better to let Iceberg come up with the exact table 
location.

What would be the output of 'DESCRIBE FORMATTED '? The catalog location 
or the table location? Maybe the table location is more informative in that 
case.

Also, please make sure that 'SHOW CREATE TABLE' shows the correct statement to 
produce the table, i.e. catalog location.

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10164) Support HadoopCatalog for Iceberg table

2020-09-14 Thread WangSheng (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195455#comment-17195455
 ] 

WangSheng commented on IMPALA-10164:


Hi [~boroknagyz], I've do some work for this jira, and it's not very difficult 
to implement this function. But HadoopCatalog is quite different from 
HadoopTables:
 # We just need Configuration to construct HadoopTables, but HadoopCatalog need 
another param location, such as hdfs://xxx/warehouse/, and this location used 
to reserve table. When using HadoopCatalog, we need to provide TableIdentifier 
which mainly contains database and table, then Iceberg will create table use 
location 'hdfs://xxx/warehouse/database/table' to storage table info;
 # When create external table, we cannot use 
'hdfs://xxx/warehouse/database/table' to loading table directly, we need use 
TableIdentifier.of(database, table) and 'hdfs://xxx/warehouse/' instead.

So here is the problem: when creating external table with HadoopCatalog, how to 
define the location?
 * If we use 'hdfs://xxx/warehouse' in sql, we can simply use this location and 
TableIdentifier.of(database, table) to loading table, but this usage is 
different from HdfsTable, a little wired;
 * If we use 'hdfs://xxx/warehouse/database/table' in sql, we need to extract 
'hdfs://xxx/warehouse', 'database', 'table' from this location, and compare 
with database, table with  from 'create external table xxx', if same, we can 
loading table, otherwise maybe throw exception.

How do you think?Here is my simple patch, I use first method to just verify, in 
this patch, we need to create table like this:
{code:java}
create external database.table 
stored as ICEBERG
location 'hdfs://test-warehouse'{code}
Here is the Gerrit url: https://gerrit.cloudera.org/#/c/16446/

> Support HadoopCatalog for Iceberg table
> ---
>
> Key: IMPALA-10164
> URL: https://issues.apache.org/jira/browse/IMPALA-10164
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: WangSheng
>Assignee: WangSheng
>Priority: Minor
>  Labels: impala-iceberg
>
> We just supported HadoopTable api to create Iceberg table in Impala now, it's 
> apparently not enough, so we preparing to support HadoopCatalog. The main 
> design is to add a new table property named 'iceberg.catalog', and default 
> value is 'hadoop.tables', we implement 'hadoop.catalog' to supported 
> HadoopCatalog api. We may even support 'hive.catalog' in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org