from:"Marton Bod \(Jira\)"

[jira] [Commented] (HIVE-26156) Iceberg delete writer should handle deleting from old partition specs

2022-04-21 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525478#comment-17525478
 ] 

Marton Bod commented on HIVE-26156:
---

Pushed to master. Thanks [~szita] for the review!

> Iceberg delete writer should handle deleting from old partition specs
> -
>
> Key: HIVE-26156
> URL: https://issues.apache.org/jira/browse/HIVE-26156
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While {{HiveIcebergRecordWriter}} always writes data out according to the 
> latest spec, the {{HiveIcebergDeleteWriter}} might have to write delete files 
> into partitions that correspond to a variety of specs, both old and new. 
> Therefore we should pass the {{{}table.specs(){}}}map into the 
> {{HiveIcebergWriter}} so that the delete writer can choose the appropriate 
> spec on a per-record basis.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (HIVE-26156) Iceberg delete writer should handle deleting from old partition specs

2022-04-21 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-26156.
---
Resolution: Fixed

> Iceberg delete writer should handle deleting from old partition specs
> -
>
> Key: HIVE-26156
> URL: https://issues.apache.org/jira/browse/HIVE-26156
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While {{HiveIcebergRecordWriter}} always writes data out according to the 
> latest spec, the {{HiveIcebergDeleteWriter}} might have to write delete files 
> into partitions that correspond to a variety of specs, both old and new. 
> Therefore we should pass the {{{}table.specs(){}}}map into the 
> {{HiveIcebergWriter}} so that the delete writer can choose the appropriate 
> spec on a per-record basis.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26156) Iceberg delete writer should handle deleting from old partition specs

2022-04-20 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-26156:
-


> Iceberg delete writer should handle deleting from old partition specs
> -
>
> Key: HIVE-26156
> URL: https://issues.apache.org/jira/browse/HIVE-26156
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> While {{HiveIcebergRecordWriter}} always writes data out according to the 
> latest spec, the {{HiveIcebergDeleteWriter}} might have to write delete files 
> into partitions that correspond to a variety of specs, both old and new. 
> Therefore we should pass the {{{}table.specs(){}}}map into the 
> {{HiveIcebergWriter}} so that the delete writer can choose the appropriate 
> spec on a per-record basis.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Assigned] (HIVE-26151) Support range-based time travel queries for Iceberg

2022-04-19 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-26151:
-


> Support range-based time travel queries for Iceberg
> ---
>
> Key: HIVE-26151
> URL: https://issues.apache.org/jira/browse/HIVE-26151
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Allow querying which records have been inserted during a certain time window 
> for Iceberg tables. The Iceberg TableScan API provides an implementation for 
> that, so most of the work would go into adding syntax support and 
> transporting the startTime and endTime parameters to the Iceberg input format.
> Proposed new syntax: 
> SELECT * FROM table FOR SYSTEM_TIME FROM '' TO ''
> SELECT * FROM table FOR SYSTEM_VERSION FROM  TO 
> (the TO clause is optional in both cases)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-12 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521023#comment-17521023
 ] 

Marton Bod commented on HIVE-26102:
---

Pushed to master.

Thanks [~pvary] for the thorough review!

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-04-12 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-26102.
---
Resolution: Fixed

> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26102) Implement DELETE statements for Iceberg tables

2022-03-31 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-26102:
-


> Implement DELETE statements for Iceberg tables
> --
>
> Key: HIVE-26102
> URL: https://issues.apache.org/jira/browse/HIVE-26102
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25989) CTLT HBaseStorageHandler is dropping underlying HBase table when failed

2022-03-10 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504190#comment-17504190
 ] 

Marton Bod commented on HIVE-25989:
---

Pushed to master. Thanks [~pvary] for the review and [~nareshpr] for reporting 
the issue!

> CTLT HBaseStorageHandler is dropping underlying HBase table when failed
> ---
>
> Key: HIVE-25989
> URL: https://issues.apache.org/jira/browse/HIVE-25989
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> With hive.strict.managed.tables & hive.create.as.acid, 
> Hive-Hbase rollback code is assuming it is a createTable failure instead of 
> CTLT & removing underlying hbase table while rolling back at here.
> [https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseMetaHook.java#L187-L195]
>  
> Repro
>  
> {code:java}
> hbase
> =
> hbase shell
> create 'hbase_hive_table', 'cf'
> beeline
> ===
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.managed.tables=true;
> set hive.create.as.acid=true;
> set hive.create.as.insert.only=true;
> set hive.default.fileformat.managed=ORC;
> > CREATE EXTERNAL TABLE `hbase_hive_table`(                       
>    `key` int COMMENT '',                            
>    `value` string COMMENT '')                       
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.hbase.HBaseSerDe'        
>  STORED BY                                          
>    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
>  WITH SERDEPROPERTIES (                             
>    'hbase.columns.mapping'=':key,cf:cf')                      
>  TBLPROPERTIES ('hbase.table.name'='hbase_hive_table');
> > select * from hbase_hive_table;
> +---+-+
> | hbase_hive_table.key  | hbase_hive_table.value  |
> +---+-+
> +---+-+
> > create table new_hbase_hive_table like hbase_hive_table;
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: The table must 
> be stored using an ACID compliant format (such as ORC): 
> default.new_hbase_hive_table
> > select * from hbase_hive_table;
> Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: 
> hbase_hive_table
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25989) CTLT HBaseStorageHandler is dropping underlying HBase table when failed

2022-03-10 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25989.
---
Resolution: Fixed

> CTLT HBaseStorageHandler is dropping underlying HBase table when failed
> ---
>
> Key: HIVE-25989
> URL: https://issues.apache.org/jira/browse/HIVE-25989
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> With hive.strict.managed.tables & hive.create.as.acid, 
> Hive-Hbase rollback code is assuming it is a createTable failure instead of 
> CTLT & removing underlying hbase table while rolling back at here.
> [https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseMetaHook.java#L187-L195]
>  
> Repro
>  
> {code:java}
> hbase
> =
> hbase shell
> create 'hbase_hive_table', 'cf'
> beeline
> ===
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.managed.tables=true;
> set hive.create.as.acid=true;
> set hive.create.as.insert.only=true;
> set hive.default.fileformat.managed=ORC;
> > CREATE EXTERNAL TABLE `hbase_hive_table`(                       
>    `key` int COMMENT '',                            
>    `value` string COMMENT '')                       
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.hbase.HBaseSerDe'        
>  STORED BY                                          
>    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
>  WITH SERDEPROPERTIES (                             
>    'hbase.columns.mapping'=':key,cf:cf')                      
>  TBLPROPERTIES ('hbase.table.name'='hbase_hive_table');
> > select * from hbase_hive_table;
> +---+-+
> | hbase_hive_table.key  | hbase_hive_table.value  |
> +---+-+
> +---+-+
> > create table new_hbase_hive_table like hbase_hive_table;
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: The table must 
> be stored using an ACID compliant format (such as ORC): 
> default.new_hbase_hive_table
> > select * from hbase_hive_table;
> Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: 
> hbase_hive_table
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501219#comment-17501219
 ] 

Marton Bod commented on HIVE-26004:
---

Pushed to master. Thanks [~pvary] for the review.

> Upgrade Iceberg to 0.13.1
> -
>
> Key: HIVE-26004
> URL: https://issues.apache.org/jira/browse/HIVE-26004
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-26004.
---
Resolution: Fixed

> Upgrade Iceberg to 0.13.1
> -
>
> Key: HIVE-26004
> URL: https://issues.apache.org/jira/browse/HIVE-26004
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-26004) Upgrade Iceberg to 0.13.1

2022-03-04 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-26004:
-


> Upgrade Iceberg to 0.13.1
> -
>
> Key: HIVE-26004
> URL: https://issues.apache.org/jira/browse/HIVE-26004
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25989) CTLT HBaseStorageHandler is dropping underlying HBase table when failed

2022-03-01 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25989:
-

Assignee: Marton Bod

> CTLT HBaseStorageHandler is dropping underlying HBase table when failed
> ---
>
> Key: HIVE-25989
> URL: https://issues.apache.org/jira/browse/HIVE-25989
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Marton Bod
>Priority: Major
>
> With hive.strict.managed.tables & hive.create.as.acid, 
> Hive-Hbase rollback code is assuming it is a createTable failure instead of 
> CTLT & removing underlying hbase table while rolling back at here.
> [https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseMetaHook.java#L187-L195]
>  
> Repro
>  
> {code:java}
> hbase
> =
> hbase shell
> create 'hbase_hive_table', 'cf'
> beeline
> ===
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.managed.tables=true;
> set hive.create.as.acid=true;
> set hive.create.as.insert.only=true;
> set hive.default.fileformat.managed=ORC;
> > CREATE EXTERNAL TABLE `hbase_hive_table`(                       
>    `key` int COMMENT '',                            
>    `value` string COMMENT '')                       
>  ROW FORMAT SERDE                                   
>    'org.apache.hadoop.hive.hbase.HBaseSerDe'        
>  STORED BY                                          
>    'org.apache.hadoop.hive.hbase.HBaseStorageHandler'  
>  WITH SERDEPROPERTIES (                             
>    'hbase.columns.mapping'=':key,cf:cf')                      
>  TBLPROPERTIES ('hbase.table.name'='hbase_hive_table');
> > select * from hbase_hive_table;
> +---+-+
> | hbase_hive_table.key  | hbase_hive_table.value  |
> +---+-+
> +---+-+
> > create table new_hbase_hive_table like hbase_hive_table;
> Caused by: org.apache.hadoop.hive.metastore.api.MetaException: The table must 
> be stored using an ACID compliant format (such as ORC): 
> default.new_hbase_hive_table
> > select * from hbase_hive_table;
> Error: java.io.IOException: org.apache.hadoop.hbase.TableNotFoundException: 
> hbase_hive_table
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions

2022-01-25 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25894:
-

Assignee: Marton Bod

> Table migration to Iceberg doesn't remove HMS partitions
> 
>
> Key: HIVE-25894
> URL: https://issues.apache.org/jira/browse/HIVE-25894
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Marton Bod
>Priority: Major
>
> Repro:
> {code:java}
> create table ice_part_migrate (i int) partitioned by (p int) stored as 
> parquet;
> insert into ice_part_migrate partition(p=1) values (1), (11), (111);
> insert into ice_part_migrate partition(p=2) values (2), (22), (222);
> ALTER TABLE ice_part_migrate  SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');
> {code}
> Then looking at the HMS database:
> {code:java}
> => select "PART_NAME" from "PARTITIONS" p, "TBLS" t where 
> t."TBL_ID"=p."TBL_ID" and t."TBL_NAME"='ice_part_migrate';
>  PART_NAME
> ---
>  p=1
>  p=2
> {code}
> This is weird because Iceberg tables are supposed to be unpartitioned. It 
> also breaks some precondition checks in Impala. Is there a particular reason 
> to keep the partitions in HMS?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25894) Table migration to Iceberg doesn't remove HMS partitions

2022-01-25 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481669#comment-17481669
 ] 

Marton Bod commented on HIVE-25894:
---

[~boroknagyz] interesting, thanks for raising this! If you load the table from 
HMS, it is unpartitioned: 

table.getPartitionKeys() gives back an empty list. Not sure why the partitions 
are not purged from the database too and whether it causes any problems.

> Table migration to Iceberg doesn't remove HMS partitions
> 
>
> Key: HIVE-25894
> URL: https://issues.apache.org/jira/browse/HIVE-25894
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Priority: Major
>
> Repro:
> {code:java}
> create table ice_part_migrate (i int) partitioned by (p int) stored as 
> parquet;
> insert into ice_part_migrate partition(p=1) values (1), (11), (111);
> insert into ice_part_migrate partition(p=2) values (2), (22), (222);
> ALTER TABLE ice_part_migrate  SET TBLPROPERTIES 
> ('storage_handler'='org.apache.iceberg.mr.hive.HiveIcebergStorageHandler');
> {code}
> Then looking at the HMS database:
> {code:java}
> => select "PART_NAME" from "PARTITIONS" p, "TBLS" t where 
> t."TBL_ID"=p."TBL_ID" and t."TBL_NAME"='ice_part_migrate';
>  PART_NAME
> ---
>  p=1
>  p=2
> {code}
> This is weird because Iceberg tables are supposed to be unpartitioned. It 
> also breaks some precondition checks in Impala. Is there a particular reason 
> to keep the partitions in HMS?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-24 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481043#comment-17481043
 ] 

Marton Bod commented on HIVE-25891:
---

Thanks for reviewing [~szita] and [~pvary] !

> Improve Iceberg error message for unsupported vectorization cases
> -
>
> Key: HIVE-25891
> URL: https://issues.apache.org/jira/browse/HIVE-25891
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, if you attempt to read a Parquet or Avro Iceberg table with 
> vectorization turned on, you will eventually get an error message since it's 
> not supported. However, this error message is very misleading and does not 
> explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-24 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25891.
---
Resolution: Fixed

> Improve Iceberg error message for unsupported vectorization cases
> -
>
> Key: HIVE-25891
> URL: https://issues.apache.org/jira/browse/HIVE-25891
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, if you attempt to read a Parquet or Avro Iceberg table with 
> vectorization turned on, you will eventually get an error message since it's 
> not supported. However, this error message is very misleading and does not 
> explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-24 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480986#comment-17480986
 ] 

Marton Bod commented on HIVE-25891:
---

PR: [https://github.com/apache/hive/pull/2965]

 

> Improve Iceberg error message for unsupported vectorization cases
> -
>
> Key: HIVE-25891
> URL: https://issues.apache.org/jira/browse/HIVE-25891
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently, if you attempt to read a Parquet or Avro Iceberg table with 
> vectorization turned on, you will eventually get an error message since it's 
> not supported. However, this error message is very misleading and does not 
> explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25891) Improve Iceberg error message for unsupported vectorization cases

2022-01-24 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25891:
-


> Improve Iceberg error message for unsupported vectorization cases
> -
>
> Key: HIVE-25891
> URL: https://issues.apache.org/jira/browse/HIVE-25891
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently, if you attempt to read a Parquet or Avro Iceberg table with 
> vectorization turned on, you will eventually get an error message since it's 
> not supported. However, this error message is very misleading and does not 
> explain clearly what the problem is and how to work around it. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables

2022-01-24 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480981#comment-17480981
 ] 

Marton Bod commented on HIVE-25890:
---

Pushed to master. Thanks [~pvary] for the review!

> Fix truncate problem with Iceberg CTAS tables
> -
>
> Key: HIVE-25890
> URL: https://issues.apache.org/jira/browse/HIVE-25890
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. 
> This is because we populate the table properties differently on the CTAS 
> codepath, and the external.table.purge=true is not populated in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables

2022-01-24 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480982#comment-17480982
 ] 

Marton Bod commented on HIVE-25890:
---

PR: [https://github.com/apache/hive/pull/2963]

 

> Fix truncate problem with Iceberg CTAS tables
> -
>
> Key: HIVE-25890
> URL: https://issues.apache.org/jira/browse/HIVE-25890
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. 
> This is because we populate the table properties differently on the CTAS 
> codepath, and the external.table.purge=true is not populated in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables

2022-01-24 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25890.
---
Resolution: Fixed

> Fix truncate problem with Iceberg CTAS tables
> -
>
> Key: HIVE-25890
> URL: https://issues.apache.org/jira/browse/HIVE-25890
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. 
> This is because we populate the table properties differently on the CTAS 
> codepath, and the external.table.purge=true is not populated in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25890) Fix truncate problem with Iceberg CTAS tables

2022-01-24 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25890:
-


> Fix truncate problem with Iceberg CTAS tables
> -
>
> Key: HIVE-25890
> URL: https://issues.apache.org/jira/browse/HIVE-25890
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently Iceberg CTAS tables cannot be truncated in a subsequent operation. 
> This is because we populate the table properties differently on the CTAS 
> codepath, and the external.table.purge=true is not populated in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-12 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25843.
---
Resolution: Fixed

> Add flag to disable Iceberg FileIO config serialization
> ---
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-12 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17474527#comment-17474527
 ] 

Marton Bod commented on HIVE-25843:
---

Pushed to master. Thanks [~pvary] for reviewing!

> Add flag to disable Iceberg FileIO config serialization
> ---
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables

2022-01-07 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25849.
---
Resolution: Fixed

> Disable insert overwrite for bucket partitioned Iceberg tables
> --
>
> Key: HIVE-25849
> URL: https://issues.apache.org/jira/browse/HIVE-25849
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Insert overwrite should be disabled where the target Iceberg table is a 
> bucket partitioned table, since which existing partitions will be overwritten 
> is very hard to predict from a user's POV, as it depends on the bucket hash 
> values calculated for the new dataset's rows. It's better to be on the safe 
> side and disable this operation to avoid unwanted data loss.
> Note: this the same approach followed by Impala too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables

2022-01-07 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470700#comment-17470700
 ] 

Marton Bod commented on HIVE-25849:
---

Pushed to master. Thanks [~szita] and [~pvary] for checking it.

> Disable insert overwrite for bucket partitioned Iceberg tables
> --
>
> Key: HIVE-25849
> URL: https://issues.apache.org/jira/browse/HIVE-25849
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Insert overwrite should be disabled where the target Iceberg table is a 
> bucket partitioned table, since which existing partitions will be overwritten 
> is very hard to predict from a user's POV, as it depends on the bucket hash 
> values calculated for the new dataset's rows. It's better to be on the safe 
> side and disable this operation to avoid unwanted data loss.
> Note: this the same approach followed by Impala too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables

2022-01-06 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod updated HIVE-25849:
--
Description: 
Insert overwrite should be disabled where the target Iceberg table is a bucket 
partitioned table, since which existing partitions will be overwritten is very 
hard to predict from a user's POV, as it depends on the bucket hash values 
calculated for the new dataset's rows. It's better to be on the safe side and 
disable this operation to avoid unwanted data loss.

Note: this the same approach followed by Impala too.

  was:Insert overwrite should be disabled where the target Iceberg table is a 
bucket partitioned table, since which existing partitions will be overwritten 
is very hard to predict from a user's POV, as it depends on the bucket hash 
values calculated for the new dataset's rows. It's better to be on the safe 
side and disable this operation to avoid unwanted data loss.


> Disable insert overwrite for bucket partitioned Iceberg tables
> --
>
> Key: HIVE-25849
> URL: https://issues.apache.org/jira/browse/HIVE-25849
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Insert overwrite should be disabled where the target Iceberg table is a 
> bucket partitioned table, since which existing partitions will be overwritten 
> is very hard to predict from a user's POV, as it depends on the bucket hash 
> values calculated for the new dataset's rows. It's better to be on the safe 
> side and disable this operation to avoid unwanted data loss.
> Note: this the same approach followed by Impala too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables

2022-01-06 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469972#comment-17469972
 ] 

Marton Bod commented on HIVE-25849:
---

PR: [https://github.com/apache/hive/pull/2856/]

 

> Disable insert overwrite for bucket partitioned Iceberg tables
> --
>
> Key: HIVE-25849
> URL: https://issues.apache.org/jira/browse/HIVE-25849
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Insert overwrite should be disabled where the target Iceberg table is a 
> bucket partitioned table, since which existing partitions will be overwritten 
> is very hard to predict from a user's POV, as it depends on the bucket hash 
> values calculated for the new dataset's rows. It's better to be on the safe 
> side and disable this operation to avoid unwanted data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25849) Disable insert overwrite for bucket partitioned Iceberg tables

2022-01-06 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25849:
-


> Disable insert overwrite for bucket partitioned Iceberg tables
> --
>
> Key: HIVE-25849
> URL: https://issues.apache.org/jira/browse/HIVE-25849
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Insert overwrite should be disabled where the target Iceberg table is a 
> bucket partitioned table, since which existing partitions will be overwritten 
> is very hard to predict from a user's POV, as it depends on the bucket hash 
> values calculated for the new dataset's rows. It's better to be on the safe 
> side and disable this operation to avoid unwanted data loss.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25843) Add flag to disable Iceberg FileIO config serialization

2022-01-04 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25843:
-


> Add flag to disable Iceberg FileIO config serialization
> ---
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25815) Add flag to skip maven validate phase for Iceberg modules

2021-12-16 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460690#comment-17460690
 ] 

Marton Bod commented on HIVE-25815:
---

Pushed to master. Thanks [~pvary] for reviewing!

> Add flag to skip maven validate phase for Iceberg modules
> -
>
> Key: HIVE-25815
> URL: https://issues.apache.org/jira/browse/HIVE-25815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> The Iceberg checkstyle and spotless plugins which run during the validate 
> phase are quite strict in terms of enforcing proper code style. This is great 
> when checking in code to master, but it can be inconvenient when doing quick 
> dev iterations locally. This PR introduces a maven {{validate.skip}} flag to 
> be able to skip checkstyle and spotless checks on demand for the Iceberg 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25815) Add flag to skip maven validate phase for Iceberg modules

2021-12-16 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460689#comment-17460689
 ] 

Marton Bod commented on HIVE-25815:
---

PR: [https://github.com/apache/hive/pull/2882]

 

> Add flag to skip maven validate phase for Iceberg modules
> -
>
> Key: HIVE-25815
> URL: https://issues.apache.org/jira/browse/HIVE-25815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> The Iceberg checkstyle and spotless plugins which run during the validate 
> phase are quite strict in terms of enforcing proper code style. This is great 
> when checking in code to master, but it can be inconvenient when doing quick 
> dev iterations locally. This PR introduces a maven {{validate.skip}} flag to 
> be able to skip checkstyle and spotless checks on demand for the Iceberg 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25815) Add flag to skip maven validate phase for Iceberg modules

2021-12-16 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25815.
---
Resolution: Fixed

> Add flag to skip maven validate phase for Iceberg modules
> -
>
> Key: HIVE-25815
> URL: https://issues.apache.org/jira/browse/HIVE-25815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> The Iceberg checkstyle and spotless plugins which run during the validate 
> phase are quite strict in terms of enforcing proper code style. This is great 
> when checking in code to master, but it can be inconvenient when doing quick 
> dev iterations locally. This PR introduces a maven {{validate.skip}} flag to 
> be able to skip checkstyle and spotless checks on demand for the Iceberg 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25815) Add flag to skip maven validate phase for Iceberg modules

2021-12-16 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25815:
-


> Add flag to skip maven validate phase for Iceberg modules
> -
>
> Key: HIVE-25815
> URL: https://issues.apache.org/jira/browse/HIVE-25815
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> The Iceberg checkstyle and spotless plugins which run during the validate 
> phase are quite strict in terms of enforcing proper code style. This is great 
> when checking in code to master, but it can be inconvenient when doing quick 
> dev iterations locally. This PR introduces a maven {{validate.skip}} flag to 
> be able to skip checkstyle and spotless checks on demand for the Iceberg 
> modules.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25788) Iceberg CTAS should honor location clause and have correct table properties

2021-12-08 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25788.
---
Resolution: Fixed

> Iceberg CTAS should honor location clause and have correct table properties
> ---
>
> Key: HIVE-25788
> URL: https://issues.apache.org/jira/browse/HIVE-25788
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently Iceberg CTAS does not take the LOCATION clause into consideration. 
> Also, these tables end up with some unintended table properties coming from 
> the SerDe, such as partition.columns or partition.columns.comments, etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25788) Iceberg CTAS should honor location clause and have correct table properties

2021-12-08 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17455277#comment-17455277
 ] 

Marton Bod commented on HIVE-25788:
---

Pushed to master. For the reviews, I'd like to thank [~pvary] and [~szita] 

> Iceberg CTAS should honor location clause and have correct table properties
> ---
>
> Key: HIVE-25788
> URL: https://issues.apache.org/jira/browse/HIVE-25788
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently Iceberg CTAS does not take the LOCATION clause into consideration. 
> Also, these tables end up with some unintended table properties coming from 
> the SerDe, such as partition.columns or partition.columns.comments, etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25788) Iceberg CTAS should honor location clause and have correct table properties

2021-12-08 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25788:
-


> Iceberg CTAS should honor location clause and have correct table properties
> ---
>
> Key: HIVE-25788
> URL: https://issues.apache.org/jira/browse/HIVE-25788
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently Iceberg CTAS does not take the LOCATION clause into consideration. 
> Also, these tables end up with some unintended table properties coming from 
> the SerDe, such as partition.columns or partition.columns.comments, etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-12-02 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25740.
---
Resolution: Fixed

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-12-02 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452495#comment-17452495
 ] 

Marton Bod commented on HIVE-25740:
---

Pushed to master. Thanks for reviewing [~klcopp] , [~pvary] and [~szita] !

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-02 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452248#comment-17452248
 ] 

Marton Bod commented on HIVE-25754:
---

Pushed to master. Thanks [~kkasa] for reviewing!

> Fix column projection for union all queries with multiple aliases
> -
>
> Key: HIVE-25754
> URL: https://issues.apache.org/jira/browse/HIVE-25754
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Given two tables:
> {code:java}
> create table source1 (dt string, d1 int, d2 int) stored as orc;
> create table source2 (dt string, d1 int, d2 int) stored as orc;
> insert into source1 values ('20211107', 1, 2);
> insert into source2 values ('20211108', 11, 22);{code}
> If you run this query with UNION ALL, the {{key}} column will be missing from 
> the output:
> {code:java}
> select explode(map('D219', D219
> ,'D220', D220)) as (key, value) from (
>    {{select '20211107' as date_key
> ,1 as D219
> ,2 as D220
> ) t}}
> union all
> select explode(map('D221', D221
> ,'D222', D222)) as (key, value)
> from (}}
>   {{select '20211107' as date_key
> ,1 as D221
> ,2 as D222
> ) t{code}
> Result:
> {code:java}
> 1
> 2
> 11
> 22{code}
>  Correct result should be:
> {code:java}
> D219 1
> D220 2
> D221 11
> D222 22{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-02 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25754.
---
Resolution: Fixed

> Fix column projection for union all queries with multiple aliases
> -
>
> Key: HIVE-25754
> URL: https://issues.apache.org/jira/browse/HIVE-25754
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Given two tables:
> {code:java}
> create table source1 (dt string, d1 int, d2 int) stored as orc;
> create table source2 (dt string, d1 int, d2 int) stored as orc;
> insert into source1 values ('20211107', 1, 2);
> insert into source2 values ('20211108', 11, 22);{code}
> If you run this query with UNION ALL, the {{key}} column will be missing from 
> the output:
> {code:java}
> select explode(map('D219', D219
> ,'D220', D220)) as (key, value) from (
>    {{select '20211107' as date_key
> ,1 as D219
> ,2 as D220
> ) t}}
> union all
> select explode(map('D221', D221
> ,'D222', D222)) as (key, value)
> from (}}
>   {{select '20211107' as date_key
> ,1 as D221
> ,2 as D222
> ) t{code}
> Result:
> {code:java}
> 1
> 2
> 11
> 22{code}
>  Correct result should be:
> {code:java}
> D219 1
> D220 2
> D221 11
> D222 22{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-01 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod updated HIVE-25754:
--
Description: 
Given two tables:
{code:java}
create table source1 (dt string, d1 int, d2 int) stored as orc;
create table source2 (dt string, d1 int, d2 int) stored as orc;
insert into source1 values ('20211107', 1, 2);
insert into source2 values ('20211108', 11, 22);{code}
If you run this query with UNION ALL, the {{key}} column will be missing from 
the output:
{code:java}
select explode(map('D219', D219
,'D220', D220)) as (key, value) from (
   {{select '20211107' as date_key
,1 as D219
,2 as D220
) t}}
union all
select explode(map('D221', D221
,'D222', D222)) as (key, value)
from (}}
  {{select '20211107' as date_key
,1 as D221
,2 as D222
) t{code}
Result:
{code:java}
1
2
11
22{code}
 Correct result should be:
{code:java}
D219 1
D220 2
D221 11
D222 22{code}

  was:
Given two tables:

 
{code:java}
create table source1 (dt string, d1 int, d2 int) stored as orc;
create table source2 (dt string, d1 int, d2 int) stored as orc;
insert into source1 values ('20211107', 1, 2);
insert into source2 values ('20211108', 11, 22);{code}

If you run this query with UNION ALL, the {{key}} column will be missing from 
the output:

 

 
{code:java}
select explode(map('D219', D219
,'D220', D220)) as (key, value) from (
   {{select '20211107' as date_key
,1 as D219
,2 as D220
) t}}
union all
select explode(map('D221', D221
,'D222', D222)) as (key, value)
from (}}
  {{select '20211107' as date_key
,1 as D221
,2 as D222
) t{code}

Result:

 
{code:java}
1
2
11
22{code}
 
Correct result should be:
{code:java}
D219 1
D220 2
D221 11
D222 22{code}


> Fix column projection for union all queries with multiple aliases
> -
>
> Key: HIVE-25754
> URL: https://issues.apache.org/jira/browse/HIVE-25754
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Given two tables:
> {code:java}
> create table source1 (dt string, d1 int, d2 int) stored as orc;
> create table source2 (dt string, d1 int, d2 int) stored as orc;
> insert into source1 values ('20211107', 1, 2);
> insert into source2 values ('20211108', 11, 22);{code}
> If you run this query with UNION ALL, the {{key}} column will be missing from 
> the output:
> {code:java}
> select explode(map('D219', D219
> ,'D220', D220)) as (key, value) from (
>    {{select '20211107' as date_key
> ,1 as D219
> ,2 as D220
> ) t}}
> union all
> select explode(map('D221', D221
> ,'D222', D222)) as (key, value)
> from (}}
>   {{select '20211107' as date_key
> ,1 as D221
> ,2 as D222
> ) t{code}
> Result:
> {code:java}
> 1
> 2
> 11
> 22{code}
>  Correct result should be:
> {code:java}
> D219 1
> D220 2
> D221 11
> D222 22{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-01 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17451733#comment-17451733
 ] 

Marton Bod commented on HIVE-25754:
---

PR: [https://github.com/apache/hive/pull/2822]

 

> Fix column projection for union all queries with multiple aliases
> -
>
> Key: HIVE-25754
> URL: https://issues.apache.org/jira/browse/HIVE-25754
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Given two tables:
>  
> {code:java}
> create table source1 (dt string, d1 int, d2 int) stored as orc;
> create table source2 (dt string, d1 int, d2 int) stored as orc;
> insert into source1 values ('20211107', 1, 2);
> insert into source2 values ('20211108', 11, 22);{code}
> If you run this query with UNION ALL, the {{key}} column will be missing from 
> the output:
>  
>  
> {code:java}
> select explode(map('D219', D219
> ,'D220', D220)) as (key, value) from (
>    {{select '20211107' as date_key
> ,1 as D219
> ,2 as D220
> ) t}}
> union all
> select explode(map('D221', D221
> ,'D222', D222)) as (key, value)
> from (}}
>   {{select '20211107' as date_key
> ,1 as D221
> ,2 as D222
> ) t{code}
> Result:
>  
> {code:java}
> 1
> 2
> 11
> 22{code}
>  
> Correct result should be:
> {code:java}
> D219 1
> D220 2
> D221 11
> D222 22{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-01 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod updated HIVE-25754:
--
Description: 
Given two tables:

 
{code:java}
create table source1 (dt string, d1 int, d2 int) stored as orc;
create table source2 (dt string, d1 int, d2 int) stored as orc;
insert into source1 values ('20211107', 1, 2);
insert into source2 values ('20211108', 11, 22);{code}

If you run this query with UNION ALL, the {{key}} column will be missing from 
the output:

 

 
{code:java}
select explode(map('D219', D219
,'D220', D220)) as (key, value) from (
   {{select '20211107' as date_key
,1 as D219
,2 as D220
) t}}
union all
select explode(map('D221', D221
,'D222', D222)) as (key, value)
from (}}
  {{select '20211107' as date_key
,1 as D221
,2 as D222
) t{code}

Result:

 
{code:java}
1
2
11
22{code}
 
Correct result should be:
{code:java}
D219 1
D220 2
D221 11
D222 22{code}

  was:
Given two tables:

{{create table source1 (dt string, d1 int, d2 int) stored as orc;
create table source2 (dt string, d1 int, d2 int) stored as orc;
insert into source1 values ('20211107', 1, 2);
insert into source2 values ('20211108', 11, 22);}}
If you run this query with UNION ALL, the {{key}} column will be missing from 
the output:

{{select explode(map('D219', D219
,'D220', D220)) as (key, value) from (}}
   {{select '20211107' as date_key
,1 as D219
,2 as D220
) t}}
{{union all}}
{{select explode(map('D221', D221
,'D222', D222)) as (key, value)
from (}}
  {{select '20211107' as date_key
,1 as D221
,2 as D222
) t}}
Result:

{{1}}
{{2}}
{{11}}
{{22}}
 
Correct result should be:

{{D219  1}}
{{D220  2}}
{{D221  11}}
{{D222  22}}


> Fix column projection for union all queries with multiple aliases
> -
>
> Key: HIVE-25754
> URL: https://issues.apache.org/jira/browse/HIVE-25754
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Given two tables:
>  
> {code:java}
> create table source1 (dt string, d1 int, d2 int) stored as orc;
> create table source2 (dt string, d1 int, d2 int) stored as orc;
> insert into source1 values ('20211107', 1, 2);
> insert into source2 values ('20211108', 11, 22);{code}
> If you run this query with UNION ALL, the {{key}} column will be missing from 
> the output:
>  
>  
> {code:java}
> select explode(map('D219', D219
> ,'D220', D220)) as (key, value) from (
>    {{select '20211107' as date_key
> ,1 as D219
> ,2 as D220
> ) t}}
> union all
> select explode(map('D221', D221
> ,'D222', D222)) as (key, value)
> from (}}
>   {{select '20211107' as date_key
> ,1 as D221
> ,2 as D222
> ) t{code}
> Result:
>  
> {code:java}
> 1
> 2
> 11
> 22{code}
>  
> Correct result should be:
> {code:java}
> D219 1
> D220 2
> D221 11
> D222 22{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25754) Fix column projection for union all queries with multiple aliases

2021-12-01 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25754:
-


> Fix column projection for union all queries with multiple aliases
> -
>
> Key: HIVE-25754
> URL: https://issues.apache.org/jira/browse/HIVE-25754
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Given two tables:
> {{create table source1 (dt string, d1 int, d2 int) stored as orc;
> create table source2 (dt string, d1 int, d2 int) stored as orc;
> insert into source1 values ('20211107', 1, 2);
> insert into source2 values ('20211108', 11, 22);}}
> If you run this query with UNION ALL, the {{key}} column will be missing from 
> the output:
> {{select explode(map('D219', D219
> ,'D220', D220)) as (key, value) from (}}
>    {{select '20211107' as date_key
> ,1 as D219
> ,2 as D220
> ) t}}
> {{union all}}
> {{select explode(map('D221', D221
> ,'D222', D222)) as (key, value)
> from (}}
>   {{select '20211107' as date_key
> ,1 as D221
> ,2 as D222
> ) t}}
> Result:
> {{1}}
> {{2}}
> {{11}}
> {{22}}
>  
> Correct result should be:
> {{D2191}}
> {{D2202}}
> {{D22111}}
> {{D22222}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-29 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25741.
---
Resolution: Fixed

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-29 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450495#comment-17450495
 ] 

Marton Bod commented on HIVE-25741:
---

Pushed to master. Thanks [~pvary] for reviewing!

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449565#comment-17449565
 ] 

Marton Bod commented on HIVE-25741:
---

PR: [https://github.com/apache/hive/pull/2819]

 

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25741:
-


> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449492#comment-17449492
 ] 

Marton Bod commented on HIVE-25740:
---

PR: [https://github.com/apache/hive/pull/2817]

 

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25740:
-


> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

2021-11-22 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25727.
---
Resolution: Fixed

> Iceberg hive catalog should create table object with initialised SerdeParams
> 
>
> Key: HIVE-25727
> URL: https://issues.apache.org/jira/browse/HIVE-25727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we leave the serdeInfo.parameters as null when we create the table 
> object to be persisted during commit time in Iceberg hive catalog. We should 
> init the params with an empty map to avoid any NPE possibilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

2021-11-22 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447322#comment-17447322
 ] 

Marton Bod commented on HIVE-25727:
---

Pushed to master. Thanks [~pvary] for reviewing it!

> Iceberg hive catalog should create table object with initialised SerdeParams
> 
>
> Key: HIVE-25727
> URL: https://issues.apache.org/jira/browse/HIVE-25727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently we leave the serdeInfo.parameters as null when we create the table 
> object to be persisted during commit time in Iceberg hive catalog. We should 
> init the params with an empty map to avoid any NPE possibilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25727) Iceberg hive catalog should create table object with initialised SerdeParams

2021-11-19 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25727:
-


> Iceberg hive catalog should create table object with initialised SerdeParams
> 
>
> Key: HIVE-25727
> URL: https://issues.apache.org/jira/browse/HIVE-25727
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently we leave the serdeInfo.parameters as null when we create the table 
> object to be persisted during commit time in Iceberg hive catalog. We should 
> init the params with an empty map to avoid any NPE possibilities.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (HIVE-25690) Fix column reorder detection for Iceberg schema evolution

2021-11-17 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17445157#comment-17445157
 ] 

Marton Bod commented on HIVE-25690:
---

Pushed to master. Thanks [~szita] for reviewing!

> Fix column reorder detection for Iceberg schema evolution
> -
>
> Key: HIVE-25690
> URL: https://issues.apache.org/jira/browse/HIVE-25690
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current algorithm for detecting schema differences between HMS and Iceberg 
> schema is broken when it comes to column reorders. This patch should fix that 
> up and add more extensive testing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25690) Fix column reorder detection for Iceberg schema evolution

2021-11-17 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25690.
---
Resolution: Fixed

> Fix column reorder detection for Iceberg schema evolution
> -
>
> Key: HIVE-25690
> URL: https://issues.apache.org/jira/browse/HIVE-25690
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current algorithm for detecting schema differences between HMS and Iceberg 
> schema is broken when it comes to column reorders. This patch should fix that 
> up and add more extensive testing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HIVE-25690) Fix column reorder detection for Iceberg schema evolution

2021-11-11 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25690:
-


> Fix column reorder detection for Iceberg schema evolution
> -
>
> Key: HIVE-25690
> URL: https://issues.apache.org/jira/browse/HIVE-25690
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Current algorithm for detecting schema differences between HMS and Iceberg 
> schema is broken when it comes to column reorders. This patch should fix that 
> up and add more extensive testing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25658.
---
Resolution: Fixed

> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435485#comment-17435485
 ] 

Marton Bod commented on HIVE-25658:
---

Committed to master. Thanks [~szita] for the review!

> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25658) Fix regex for masking totalSize table properties in Iceberg q-tests

2021-10-28 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25658:
-


> Fix regex for masking totalSize table properties in Iceberg q-tests
> ---
>
> Key: HIVE-25658
> URL: https://issues.apache.org/jira/browse/HIVE-25658
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> HIVE-25607 introduced a text replace regex for masking out the totalSize 
> table property values in Iceberg q.out files. The regex however did not cover 
> all of the props in the q.out files, so here is the fix for the regex.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-26 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434474#comment-17434474
 ] 

Marton Bod commented on HIVE-25643:
---

Pushed to master. Thanks [~szita] for the review!

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-26 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25643.
---
Resolution: Fixed

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-25 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod updated HIVE-25643:
--
Description: Since the Iceberg table migration will intentionally not 
rewrite the data files, the migrated table will end up with data files that do 
not contain the Iceberg field IDs necessary for safe, reliable schema 
evolution. For this purpose, we should disallow the REPLACE COLUMNS and CHANGE 
COLUMN commands for these migrated Iceberg tables. ADD COLUMNS are still 
permitted.  (was: Since the Iceberg table migration will intentionally not 
rewrite the data files, the migrated table will end up with data files that do 
not contain the Iceberg field IDs necessary for safe, reliable schema 
migration. For this purpose, we should disallow the REPLACE COLUMNS and CHANGE 
COLUMN commands for these migrated Iceberg tables. ADD COLUMNS are still 
permitted.)

> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema evolution. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25643) Disable replace cols and change col commands for migrated Iceberg tables

2021-10-25 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25643:
-


> Disable replace cols and change col commands for migrated Iceberg tables
> 
>
> Key: HIVE-25643
> URL: https://issues.apache.org/jira/browse/HIVE-25643
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Since the Iceberg table migration will intentionally not rewrite the data 
> files, the migrated table will end up with data files that do not contain the 
> Iceberg field IDs necessary for safe, reliable schema migration. For this 
> purpose, we should disallow the REPLACE COLUMNS and CHANGE COLUMN commands 
> for these migrated Iceberg tables. ADD COLUMNS are still permitted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25622) Change storage handler authz URI API to use the HMS table object

2021-10-19 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430445#comment-17430445
 ] 

Marton Bod commented on HIVE-25622:
---

Pushed to master. Thanks [~pvary] for the review!

> Change storage handler authz URI API to use the HMS table object
> 
>
> Key: HIVE-25622
> URL: https://issues.apache.org/jira/browse/HIVE-25622
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> * Moving the {{getURIForAuth}} method into {{HiveStorageHandler}} with a 
> default implementation
>  * Changing its signature to accept the HMS table object instead, as it 
> provides implementations with more flexibility around constructing the URIs
>  * Deleting the StorageAuthorizationHandler interface
>  * Cleaning up code parts where {{getURIForAuth}} is invoked



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25622) Change storage handler authz URI API to use the HMS table object

2021-10-19 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25622.
---
Resolution: Fixed

> Change storage handler authz URI API to use the HMS table object
> 
>
> Key: HIVE-25622
> URL: https://issues.apache.org/jira/browse/HIVE-25622
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> * Moving the {{getURIForAuth}} method into {{HiveStorageHandler}} with a 
> default implementation
>  * Changing its signature to accept the HMS table object instead, as it 
> provides implementations with more flexibility around constructing the URIs
>  * Deleting the StorageAuthorizationHandler interface
>  * Cleaning up code parts where {{getURIForAuth}} is invoked



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25622) Change storage handler authz URI API to use the HMS table object

2021-10-19 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25622:
-


> Change storage handler authz URI API to use the HMS table object
> 
>
> Key: HIVE-25622
> URL: https://issues.apache.org/jira/browse/HIVE-25622
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> * Moving the {{getURIForAuth}} method into {{HiveStorageHandler}} with a 
> default implementation
>  * Changing its signature to accept the HMS table object instead, as it 
> provides implementations with more flexibility around constructing the URIs
>  * Deleting the StorageAuthorizationHandler interface
>  * Cleaning up code parts where {{getURIForAuth}} is invoked



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-12 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427765#comment-17427765
 ] 

Marton Bod commented on HIVE-25607:
---

Pushed to master. Thanks [~szita] and [~pvary] for the reviews.

> Mask totalSize table property in Iceberg q-tests
> 
>
> Key: HIVE-25607
> URL: https://issues.apache.org/jira/browse/HIVE-25607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The totalSize tbl prop can change whenever the file format version changes, 
> therefore potentially causing the q tests to be flaky when issuing describe 
> formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-12 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25607.
---
Resolution: Fixed

> Mask totalSize table property in Iceberg q-tests
> 
>
> Key: HIVE-25607
> URL: https://issues.apache.org/jira/browse/HIVE-25607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The totalSize tbl prop can change whenever the file format version changes, 
> therefore potentially causing the q tests to be flaky when issuing describe 
> formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25607) Mask totalSize table property in Iceberg q-tests

2021-10-11 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25607:
-


> Mask totalSize table property in Iceberg q-tests
> 
>
> Key: HIVE-25607
> URL: https://issues.apache.org/jira/browse/HIVE-25607
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> The totalSize tbl prop can change whenever the file format version changes, 
> therefore potentially causing the q tests to be flaky when issuing describe 
> formatted commands. We should mask this and not test against the exact value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25604) Iceberg should implement the authorization storage handler

2021-10-08 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426097#comment-17426097
 ] 

Marton Bod commented on HIVE-25604:
---

Pushed to master. Thanks [~pvary] for the review!

> Iceberg should implement the authorization storage handler
> --
>
> Key: HIVE-25604
> URL: https://issues.apache.org/jira/browse/HIVE-25604
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Iceberg's StorageHandler should implement the HiveStorageAuthorizationHandler 
> interface for authorization purposes. We'll use the iceberg table root 
> location as the basis for permission handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25604) Iceberg should implement the authorization storage handler

2021-10-08 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25604.
---
Resolution: Fixed

> Iceberg should implement the authorization storage handler
> --
>
> Key: HIVE-25604
> URL: https://issues.apache.org/jira/browse/HIVE-25604
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Iceberg's StorageHandler should implement the HiveStorageAuthorizationHandler 
> interface for authorization purposes. We'll use the iceberg table root 
> location as the basis for permission handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25604) Iceberg should implement the authorization storage handler

2021-10-08 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25604:
-


> Iceberg should implement the authorization storage handler
> --
>
> Key: HIVE-25604
> URL: https://issues.apache.org/jira/browse/HIVE-25604
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Iceberg's StorageHandler should implement the HiveStorageAuthorizationHandler 
> interface for authorization purposes. We'll use the iceberg table root 
> location as the basis for permission handling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25587) Disable Iceberg table migration for unsupported source file formats

2021-10-05 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25587.
---
Resolution: Fixed

> Disable Iceberg table migration for unsupported source file formats
> ---
>
> Key: HIVE-25587
> URL: https://issues.apache.org/jira/browse/HIVE-25587
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, we only support migrating ORC, Parquet and Avro tables to Iceberg. 
> However, there is no check in the code to fail early for other formats (e.g. 
> text, json, rcfile), which can lead to wasted effort at best, and leaving the 
> source table unusable at worst. Therefore, we should check the source format 
> early and shortcircuit for unsupported types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25587) Disable Iceberg table migration for unsupported source file formats

2021-10-05 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424391#comment-17424391
 ] 

Marton Bod commented on HIVE-25587:
---

Pushed to master. Thanks [~szita] and [~pvary] for the reviews.

> Disable Iceberg table migration for unsupported source file formats
> ---
>
> Key: HIVE-25587
> URL: https://issues.apache.org/jira/browse/HIVE-25587
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, we only support migrating ORC, Parquet and Avro tables to Iceberg. 
> However, there is no check in the code to fail early for other formats (e.g. 
> text, json, rcfile), which can lead to wasted effort at best, and leaving the 
> source table unusable at worst. Therefore, we should check the source format 
> early and shortcircuit for unsupported types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25581) Iceberg storage handler should set common projection pruning config

2021-10-02 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423573#comment-17423573
 ] 

Marton Bod commented on HIVE-25581:
---

Pushed to master. Thanks [~pvary] and [~szita] for the review.

> Iceberg storage handler should set common projection pruning config
> ---
>
> Key: HIVE-25581
> URL: https://issues.apache.org/jira/browse/HIVE-25581
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the value for the config "tez.mrreader.config.update.properties" is 
> not set for Iceberg jobs, when in fact it needs to be part of the jobConf for 
> all Iceberg queries. This change should ensure it's set by the Iceberg 
> storage handler by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25581) Iceberg storage handler should set common projection pruning config

2021-10-02 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25581.
---
Resolution: Fixed

> Iceberg storage handler should set common projection pruning config
> ---
>
> Key: HIVE-25581
> URL: https://issues.apache.org/jira/browse/HIVE-25581
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the value for the config "tez.mrreader.config.update.properties" is 
> not set for Iceberg jobs, when in fact it needs to be part of the jobConf for 
> all Iceberg queries. This change should ensure it's set by the Iceberg 
> storage handler by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25587) Disable Iceberg table migration for unsupported source file formats

2021-10-01 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25587:
-


> Disable Iceberg table migration for unsupported source file formats
> ---
>
> Key: HIVE-25587
> URL: https://issues.apache.org/jira/browse/HIVE-25587
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently, we only support migrating ORC, Parquet and Avro tables to Iceberg. 
> However, there is no check in the code to fail early for other formats (e.g. 
> text, json, rcfile), which can lead to wasted effort at best, and leaving the 
> source table unusable at worst. Therefore, we should check the source format 
> early and shortcircuit for unsupported types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25581) Iceberg storage handler should set common projection pruning config

2021-09-30 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25581:
-


> Iceberg storage handler should set common projection pruning config
> ---
>
> Key: HIVE-25581
> URL: https://issues.apache.org/jira/browse/HIVE-25581
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently the value for the config "tez.mrreader.config.update.properties" is 
> not set for Iceberg jobs, when in fact it needs to be part of the jobConf for 
> all Iceberg queries. This change should ensure it's set by the Iceberg 
> storage handler by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files

2021-09-17 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25529.
---
Resolution: Fixed

> Add tests for reading/writing Iceberg V2 tables with delete files
> -
>
> Key: HIVE-25529
> URL: https://issues.apache.org/jira/browse/HIVE-25529
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since Iceberg V2 tables are now official, we can start testing out whether V2 
> tables can be created/read/written by Hive. While Hive has no delete 
> statement yet on Iceberg tables, we can nonetheless use the Iceberg API to 
> create delete files manually and then check if Hive honors those deletes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files

2021-09-17 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416516#comment-17416516
 ] 

Marton Bod commented on HIVE-25529:
---

Pushed to master, thanks for the review [~pvary]!

> Add tests for reading/writing Iceberg V2 tables with delete files
> -
>
> Key: HIVE-25529
> URL: https://issues.apache.org/jira/browse/HIVE-25529
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since Iceberg V2 tables are now official, we can start testing out whether V2 
> tables can be created/read/written by Hive. While Hive has no delete 
> statement yet on Iceberg tables, we can nonetheless use the Iceberg API to 
> create delete files manually and then check if Hive honors those deletes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25529) Add tests for reading/writing Iceberg V2 tables with delete files

2021-09-16 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25529:
-


> Add tests for reading/writing Iceberg V2 tables with delete files
> -
>
> Key: HIVE-25529
> URL: https://issues.apache.org/jira/browse/HIVE-25529
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Since Iceberg V2 tables are now official, we can start testing out whether V2 
> tables can be created/read/written by Hive. While Hive has no delete 
> statement yet on Iceberg tables, we can nonetheless use the Iceberg API to 
> create delete files manually and then check if Hive honors those deletes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25486) Upgrade to Iceberg 0.12.0

2021-08-27 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405848#comment-17405848
 ] 

Marton Bod commented on HIVE-25486:
---

Pushed to master. Thanks [~pvary] for the review!

> Upgrade to Iceberg 0.12.0
> -
>
> Key: HIVE-25486
> URL: https://issues.apache.org/jira/browse/HIVE-25486
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25486) Upgrade to Iceberg 0.12.0

2021-08-27 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25486.
---
Resolution: Fixed

> Upgrade to Iceberg 0.12.0
> -
>
> Key: HIVE-25486
> URL: https://issues.apache.org/jira/browse/HIVE-25486
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-25486) Upgrade to Iceberg 0.12.0

2021-08-27 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod updated HIVE-25486:
--
Summary: Upgrade to Iceberg 0.12.0  (was: Upgrade to I)

> Upgrade to Iceberg 0.12.0
> -
>
> Key: HIVE-25486
> URL: https://issues.apache.org/jira/browse/HIVE-25486
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25486) Upgrade to Iceberg 0.12.0

2021-08-27 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25486:
-

Assignee: Marton Bod

> Upgrade to Iceberg 0.12.0
> -
>
> Key: HIVE-25486
> URL: https://issues.apache.org/jira/browse/HIVE-25486
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-08-05 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393777#comment-17393777
 ] 

Marton Bod commented on HIVE-25328:
---

Pushed to master. Thanks for the reviews, [~szita], [~pvary]!

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-08-05 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25328.
---
Resolution: Fixed

> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25354) Handle unsupported queries for Iceberg tables

2021-07-20 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25354:
-


> Handle unsupported queries for Iceberg tables
> -
>
> Key: HIVE-25354
> URL: https://issues.apache.org/jira/browse/HIVE-25354
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> In Iceberg, there will be several Hive commands that will be unsupported 
> either temporarily or else. For example, right now all commands containing 
> the PARTITION keyword would fail the Semantic Analysis given that the HMS 
> table is always unpartitioned for Iceberg. The resulting error message for 
> the user would be confusing. We should provide a common, unified error 
> handling for these queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25328) Limit scope of REPLACE COLUMNS for Iceberg tables

2021-07-14 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25328:
-


> Limit scope of REPLACE COLUMNS for Iceberg tables
> -
>
> Key: HIVE-25328
> URL: https://issues.apache.org/jira/browse/HIVE-25328
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Replace columns is a rather wildcard operation which can do heavy-weight 
> schema changes. We would only want to allow this operation for dropping 
> columns for Iceberg tables. For other changes (adding cols, renaming, type 
> promotion etc.), we should use the CHANGE COLUMN command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25308) Use new Tez API to get JobID for Iceberg commits

2021-07-08 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25308.
---
Resolution: Fixed

> Use new Tez API to get JobID for Iceberg commits
> 
>
> Key: HIVE-25308
> URL: https://issues.apache.org/jira/browse/HIVE-25308
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When committing Iceberg writes, currently we only have the JobID without the 
> vertexID, therefore we have to list the folder {{/temp}} 
> first, and parse out the full JobIDs (incl. vertexID) from the resulting 
> folder names.
> With Tez 0.10.1 released, now we have a new API we can call to acquire the 
> full JobID, making the file listing unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25308) Use new Tez API to get JobID for Iceberg commits

2021-07-06 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25308:
-


> Use new Tez API to get JobID for Iceberg commits
> 
>
> Key: HIVE-25308
> URL: https://issues.apache.org/jira/browse/HIVE-25308
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> When committing Iceberg writes, currently we only have the JobID without the 
> vertexID, therefore we have to list the folder {{/temp}} 
> first, and parse out the full JobIDs (incl. vertexID) from the resulting 
> folder names.
> With Tez 0.10.1 released, now we have a new API we can call to acquire the 
> full JobID, making the file listing unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25265) Fix TestHiveIcebergStorageHandlerWithEngine

2021-06-21 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25265:
-

Assignee: Marton Bod

> Fix TestHiveIcebergStorageHandlerWithEngine
> ---
>
> Key: HIVE-25265
> URL: https://issues.apache.org/jira/browse/HIVE-25265
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Marton Bod
>Priority: Major
>
> test is unstable:
> http://ci.hive.apache.org/job/hive-flaky-check/251/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25256) Support ALTER TABLE CHANGE COLUMN for Iceberg

2021-06-16 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25256:
-


> Support ALTER TABLE CHANGE COLUMN for Iceberg
> -
>
> Key: HIVE-25256
> URL: https://issues.apache.org/jira/browse/HIVE-25256
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> In order to provide support for renaming/changing the data type of a single 
> column, we should add alter table change column support for Iceberg tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25255) Support ALTER TABLE REPLACE COLUMNS for Iceberg

2021-06-16 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25255:
-


> Support ALTER TABLE REPLACE COLUMNS for Iceberg
> ---
>
> Key: HIVE-25255
> URL: https://issues.apache.org/jira/browse/HIVE-25255
> Project: Hive
>  Issue Type: New Feature
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false

2021-06-10 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25226.
---
Resolution: Won't Fix

> Hive changes 'storage_handler' for existing Iceberg table when 
> hive.engine.enabled is false
> ---
>
> Key: HIVE-25226
> URL: https://issues.apache.org/jira/browse/HIVE-25226
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Marton Bod
>Priority: Major
>  Labels: iceberg
>
> If Hive writes to an existing Iceberg table but property 
> 'hive.engine.enabled' is not set, then Hive rewrites the table metadata with 
> different SerDe/Input/Output format than it had before.
> E.g. there's an existing table with the following metadata:
> {noformat}
>   storage_handler  | 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
> | SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL |
> | InputFormat:   | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL |
> | OutputFormat:  | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL |
> {noformat}
> Now when Hive inserts to this table it clears 'storage_handler' and rewrites 
> the rest:
> {noformat}
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>   |
> | InputFormat:  | org.apache.hadoop.mapred.FileInputFormat
>| NULL   |
> | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat   
>| NULL   |
> {noformat}
> This means the table becomes unreadable:
> {noformat}
> Error: java.io.IOException: java.io.IOException: Cannot create an instance of 
> InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in 
> mapredWork! (state=,code=0)
> {noformat}
> I think Hive should always set 'hive.engine.enabled' for Iceberg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false

2021-06-10 Thread Marton Bod (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-25226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360701#comment-17360701
 ] 

Marton Bod commented on HIVE-25226:
---

As discussed with [~boroknagyz] this is currently the expected behaviour if 
engine.hive.enabled is not set to true in the table properties. This will be 
handled by: https://issues.apache.org/jira/browse/IMPALA-10741

> Hive changes 'storage_handler' for existing Iceberg table when 
> hive.engine.enabled is false
> ---
>
> Key: HIVE-25226
> URL: https://issues.apache.org/jira/browse/HIVE-25226
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Marton Bod
>Priority: Major
>  Labels: iceberg
>
> If Hive writes to an existing Iceberg table but property 
> 'hive.engine.enabled' is not set, then Hive rewrites the table metadata with 
> different SerDe/Input/Output format than it had before.
> E.g. there's an existing table with the following metadata:
> {noformat}
>   storage_handler  | 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
> | SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL |
> | InputFormat:   | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL |
> | OutputFormat:  | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL |
> {noformat}
> Now when Hive inserts to this table it clears 'storage_handler' and rewrites 
> the rest:
> {noformat}
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>   |
> | InputFormat:  | org.apache.hadoop.mapred.FileInputFormat
>| NULL   |
> | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat   
>| NULL   |
> {noformat}
> This means the table becomes unreadable:
> {noformat}
> Error: java.io.IOException: java.io.IOException: Cannot create an instance of 
> InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in 
> mapredWork! (state=,code=0)
> {noformat}
> I think Hive should always set 'hive.engine.enabled' for Iceberg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25226) Hive changes 'storage_handler' for existing Iceberg table when hive.engine.enabled is false

2021-06-09 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25226:
-

Assignee: Marton Bod

> Hive changes 'storage_handler' for existing Iceberg table when 
> hive.engine.enabled is false
> ---
>
> Key: HIVE-25226
> URL: https://issues.apache.org/jira/browse/HIVE-25226
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Marton Bod
>Priority: Major
>  Labels: iceberg
>
> If Hive writes to an existing Iceberg table but property 
> 'hive.engine.enabled' is not set, then Hive rewrites the table metadata with 
> different SerDe/Input/Output format than it had before.
> E.g. there's an existing table with the following metadata:
> {noformat}
>   storage_handler  | 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
> | SerDe Library: | org.apache.iceberg.mr.hive.HiveIcebergSerDe | NULL |
> | InputFormat:   | org.apache.iceberg.mr.hive.HiveIcebergInputFormat | NULL |
> | OutputFormat:  | org.apache.iceberg.mr.hive.HiveIcebergOutputFormat | NULL |
> {noformat}
> Now when Hive inserts to this table it clears 'storage_handler' and rewrites 
> the rest:
> {noformat}
> | SerDe Library:| 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL 
>   |
> | InputFormat:  | org.apache.hadoop.mapred.FileInputFormat
>| NULL   |
> | OutputFormat: | org.apache.hadoop.mapred.FileOutputFormat   
>| NULL   |
> {noformat}
> This means the table becomes unreadable:
> {noformat}
> Error: java.io.IOException: java.io.IOException: Cannot create an instance of 
> InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in 
> mapredWork! (state=,code=0)
> {noformat}
> I think Hive should always set 'hive.engine.enabled' for Iceberg.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HIVE-25222) Fix reading Iceberg tables with a comma in column names

2021-06-09 Thread Marton Bod (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25222:
-


> Fix reading Iceberg tables with a comma in column names
> ---
>
> Key: HIVE-25222
> URL: https://issues.apache.org/jira/browse/HIVE-25222
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> When using a table with a column name containing a comma (e.g. `employ,ee`), 
> reading an Iceberg table fails because we rely on the property 
> "hive.io.file.readcolumn.names" which encodes the read columns in a 
> comma-separated list, put together by the ColumnProjectionUtils class.
> Because it's comma-separated in all cases, it will produce a string like: 
> "id,birth_date,employ,ee" which can cause problems for Iceberg readers which 
> use this string list to construct their expected read schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 >

1 - 100 of 359 matches

Mail list logo