[jira] [Resolved] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao resolved HIVE-28277.
-
Fix Version/s: 4.0.0
   Resolution: Won't Fix

> HIVE does not support update operations for ICEBERG of type 
> location_based_table.
> -
>
> Key: HIVE-28277
> URL: https://issues.apache.org/jira/browse/HIVE-28277
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
> Environment: ICEBERG:1.5.2
> HIVE 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
> Fix For: 4.0.0
>
>
> Currently, when I update the location_based_table using hive, hive 
> incorrectly empties all data directories and metadata directories.
> After the update statement is executed, the iceberg table is corrupted.
>  
> {code:java}
> --spark 3.4.1 + iceberg 1.5.2:
> CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
> id string,name string
> )
> using iceberg
> PARTITIONED BY (name)
> TBLPROPERTIES 
> ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');
> insert into datacenter.default.test_data_04(id,name) 
> values('1','a'),('2','b');
> --hive4:
> CREATE EXTERNAL TABLE default.test_data_04
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select id,name from default.test_data_04; --2 row
> update test_data_04 set name = 'adasd' where id = '1';
> ERROR:
> 2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
> hive.HiveIcebergStorageHandler: Error while trying to commit job: 
> job_17061635207991_169536, job_17061635207990_169536, 
> job_17061635207992_169536, starting rollback changes for table: 
> default.test_data_04
> org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
> location: /iceberg-catalog/warehouse/default/test_data_04
> BEFORE UPDATE:
> ICEBERG TABLE DIR:
> [root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 2 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/data
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/metadata
> AFTER UPDATE:
> ICEBERG TABLE DIR:
> [root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 3 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848817#comment-17848817
 ] 

yongzhi.shao commented on HIVE-28277:
-

我更新了代码,问题确实消失了.谢谢

> HIVE does not support update operations for ICEBERG of type 
> location_based_table.
> -
>
> Key: HIVE-28277
> URL: https://issues.apache.org/jira/browse/HIVE-28277
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
> Environment: ICEBERG:1.5.2
> HIVE 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Currently, when I update the location_based_table using hive, hive 
> incorrectly empties all data directories and metadata directories.
> After the update statement is executed, the iceberg table is corrupted.
>  
> {code:java}
> --spark 3.4.1 + iceberg 1.5.2:
> CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
> id string,name string
> )
> using iceberg
> PARTITIONED BY (name)
> TBLPROPERTIES 
> ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');
> insert into datacenter.default.test_data_04(id,name) 
> values('1','a'),('2','b');
> --hive4:
> CREATE EXTERNAL TABLE default.test_data_04
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select id,name from default.test_data_04; --2 row
> update test_data_04 set name = 'adasd' where id = '1';
> ERROR:
> 2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
> hive.HiveIcebergStorageHandler: Error while trying to commit job: 
> job_17061635207991_169536, job_17061635207990_169536, 
> job_17061635207992_169536, starting rollback changes for table: 
> default.test_data_04
> org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
> location: /iceberg-catalog/warehouse/default/test_data_04
> BEFORE UPDATE:
> ICEBERG TABLE DIR:
> [root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 2 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/data
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/metadata
> AFTER UPDATE:
> ICEBERG TABLE DIR:
> [root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 3 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848817#comment-17848817
 ] 

yongzhi.shao edited comment on HIVE-28277 at 5/23/24 5:15 AM:
--

I've updated the code and the problem has gone away. Thank you, sir.


was (Author: lisoda):
我更新了代码,问题确实消失了.谢谢

> HIVE does not support update operations for ICEBERG of type 
> location_based_table.
> -
>
> Key: HIVE-28277
> URL: https://issues.apache.org/jira/browse/HIVE-28277
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
> Environment: ICEBERG:1.5.2
> HIVE 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Currently, when I update the location_based_table using hive, hive 
> incorrectly empties all data directories and metadata directories.
> After the update statement is executed, the iceberg table is corrupted.
>  
> {code:java}
> --spark 3.4.1 + iceberg 1.5.2:
> CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
> id string,name string
> )
> using iceberg
> PARTITIONED BY (name)
> TBLPROPERTIES 
> ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');
> insert into datacenter.default.test_data_04(id,name) 
> values('1','a'),('2','b');
> --hive4:
> CREATE EXTERNAL TABLE default.test_data_04
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select id,name from default.test_data_04; --2 row
> update test_data_04 set name = 'adasd' where id = '1';
> ERROR:
> 2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
> hive.HiveIcebergStorageHandler: Error while trying to commit job: 
> job_17061635207991_169536, job_17061635207990_169536, 
> job_17061635207992_169536, starting rollback changes for table: 
> default.test_data_04
> org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
> location: /iceberg-catalog/warehouse/default/test_data_04
> BEFORE UPDATE:
> ICEBERG TABLE DIR:
> [root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 2 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/data
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/metadata
> AFTER UPDATE:
> ICEBERG TABLE DIR:
> [root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 3 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848814#comment-17848814
 ] 

Butao Zhang commented on HIVE-28277:


I didn't reproduce this issue on hive4/master. maybe some other env problem...

> HIVE does not support update operations for ICEBERG of type 
> location_based_table.
> -
>
> Key: HIVE-28277
> URL: https://issues.apache.org/jira/browse/HIVE-28277
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
> Environment: ICEBERG:1.5.2
> HIVE 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Currently, when I update the location_based_table using hive, hive 
> incorrectly empties all data directories and metadata directories.
> After the update statement is executed, the iceberg table is corrupted.
>  
> {code:java}
> --spark 3.4.1 + iceberg 1.5.2:
> CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
> id string,name string
> )
> using iceberg
> PARTITIONED BY (name)
> TBLPROPERTIES 
> ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');
> insert into datacenter.default.test_data_04(id,name) 
> values('1','a'),('2','b');
> --hive4:
> CREATE EXTERNAL TABLE default.test_data_04
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select id,name from default.test_data_04; --2 row
> update test_data_04 set name = 'adasd' where id = '1';
> ERROR:
> 2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
> hive.HiveIcebergStorageHandler: Error while trying to commit job: 
> job_17061635207991_169536, job_17061635207990_169536, 
> job_17061635207992_169536, starting rollback changes for table: 
> default.test_data_04
> org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
> location: /iceberg-catalog/warehouse/default/test_data_04
> BEFORE UPDATE:
> ICEBERG TABLE DIR:
> [root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 2 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/data
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/metadata
> AFTER UPDATE:
> ICEBERG TABLE DIR:
> [root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 3 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28277:

Description: 
Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

After the update statement is executed, the iceberg table is corrupted.

 
{code:java}
--spark 3.4.1 + iceberg 1.5.2:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id,name from default.test_data_04; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3


{code}
 

 

  was:
Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

After the update statement is executed, the iceberg table is corrupted.

 
{code:java}
--spark 3.4.1 + iceberg 1.5.2:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select distinct id,name from (select id,name from default.test_data_04 limit 
10) s1; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 

[jira] [Updated] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28277:

Issue Type: Bug  (was: Improvement)

> HIVE does not support update operations for ICEBERG of type 
> location_based_table.
> -
>
> Key: HIVE-28277
> URL: https://issues.apache.org/jira/browse/HIVE-28277
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0
> Environment: ICEBERG:1.5.2
> HIVE 4.0.0
>Reporter: yongzhi.shao
>Priority: Major
>
> Currently, when I update the location_based_table using hive, hive 
> incorrectly empties all data directories and metadata directories.
> After the update statement is executed, the iceberg table is corrupted.
>  
> {code:java}
> --spark 3.4.1 + iceberg 1.5.2:
> CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
> id string,name string
> )
> using iceberg
> PARTITIONED BY (name)
> TBLPROPERTIES 
> ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');
> insert into datacenter.default.test_data_04(id,name) 
> values('1','a'),('2','b');
> --hive4:
> CREATE EXTERNAL TABLE default.test_data_04
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select distinct id,name from (select id,name from default.test_data_04 limit 
> 10) s1; --2 row
> update test_data_04 set name = 'adasd' where id = '1';
> ERROR:
> 2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
> hive.HiveIcebergStorageHandler: Error while trying to commit job: 
> job_17061635207991_169536, job_17061635207990_169536, 
> job_17061635207992_169536, starting rollback changes for table: 
> default.test_data_04
> org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
> location: /iceberg-catalog/warehouse/default/test_data_04
> BEFORE UPDATE:
> ICEBERG TABLE DIR:
> [root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 2 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/data
> drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
> /iceberg-catalog/warehouse/default/test_data_04/metadata
> AFTER UPDATE:
> ICEBERG TABLE DIR:
> [root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
> Found 3 items
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
> drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
> /iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28277:

Description: 
Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

After the update statement is executed, the iceberg table is corrupted.

 
{code:java}
--spark 3.4.1 + iceberg 1.5.2:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select distinct id,name from (select id,name from default.test_data_04 limit 
10) s1; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3


{code}
 

 

  was:
Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

After the update statement is executed, the iceberg table is corrupted.

 
{code:java}
--spark:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select distinct id,name from (select id,name from default.test_data_04 limit 
10) s1; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 

[jira] [Updated] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-28277:

Description: 
Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

After the update statement is executed, the iceberg table is corrupted.

 
{code:java}
--spark:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select distinct id,name from (select id,name from default.test_data_04 limit 
10) s1; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3


{code}
 

 

  was:
Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

 

 
{code:java}
--spark:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select distinct id,name from (select id,name from default.test_data_04 limit 
10) s1; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3


{code}
 

 


> HIVE does not support update 

[jira] [Created] (HIVE-28277) HIVE does not support update operations for ICEBERG of type location_based_table.

2024-05-22 Thread yongzhi.shao (Jira)
yongzhi.shao created HIVE-28277:
---

 Summary: HIVE does not support update operations for ICEBERG of 
type location_based_table.
 Key: HIVE-28277
 URL: https://issues.apache.org/jira/browse/HIVE-28277
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Affects Versions: 4.0.0
 Environment: ICEBERG:1.5.2

HIVE 4.0.0
Reporter: yongzhi.shao


Currently, when I update the location_based_table using hive, hive incorrectly 
empties all data directories and metadata directories.

 

 
{code:java}
--spark:
CREATE TABLE IF NOT EXISTS datacenter.default.test_data_04 (
id string,name string
)
using iceberg
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true');

insert into datacenter.default.test_data_04(id,name) values('1','a'),('2','b');

--hive4:
CREATE EXTERNAL TABLE default.test_data_04
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs:///iceberg-catalog/warehouse/default/test_data_04'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select distinct id,name from (select id,name from default.test_data_04 limit 
10) s1; --2 row

update test_data_04 set name = 'adasd' where id = '1';

ERROR:
2024-05-23T10:26:32,028 ERROR [HiveServer2-Background-Pool: Thread-297] 
hive.HiveIcebergStorageHandler: Error while trying to commit job: 
job_17061635207991_169536, job_17061635207990_169536, 
job_17061635207992_169536, starting rollback changes for table: 
default.test_data_04
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at 
location: /iceberg-catalog/warehouse/default/test_data_04


BEFORE UPDATE:
ICEBERG TABLE DIR:
[root@ ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 2 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/data
drwxr-xr-x   - hive hdfs          0 2024-05-23 09:26 
/iceberg-catalog/warehouse/default/test_data_04/metadata


AFTER UPDATE:
ICEBERG TABLE DIR:

[root@XXX ~]# hdfs dfs -ls /iceberg-catalog/warehouse/default/test_data_04
Found 3 items
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_1
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_2
drwxr-xr-x   - hive hdfs          0 2024-05-23 10:26 
/iceberg-catalog/warehouse/default/test_data_04/-tmp.HIVE_UNION_SUBDIR_3


{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28276) Iceberg: Make Iceberg split threads configurable when table scanning

2024-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28276:
--
Labels: pull-request-available  (was: )

> Iceberg: Make Iceberg split threads configurable when table scanning
> 
>
> Key: HIVE-28276
> URL: https://issues.apache.org/jira/browse/HIVE-28276
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28276) Iceberg: Make Iceberg split threads configurable when table scanning

2024-05-22 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-28276:
--

 Summary: Iceberg: Make Iceberg split threads configurable when 
table scanning
 Key: HIVE-28276
 URL: https://issues.apache.org/jira/browse/HIVE-28276
 Project: Hive
  Issue Type: Improvement
  Components: Iceberg integration
Reporter: Butao Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28276) Iceberg: Make Iceberg split threads configurable when table scanning

2024-05-22 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang reassigned HIVE-28276:
--

Assignee: Butao Zhang

> Iceberg: Make Iceberg split threads configurable when table scanning
> 
>
> Key: HIVE-28276
> URL: https://issues.apache.org/jira/browse/HIVE-28276
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null

2024-05-22 Thread Jiandan Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiandan Yang  reassigned HIVE-25351:


Assignee: Jiandan Yang   (was: Dayakar M)

> stddev(), stddev_pop() with CBO enable returning null
> -
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Jiandan Yang 
>Priority: Blocker
>  Labels: pull-request-available
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |   default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> |  

[jira] [Updated] (HIVE-28274) Iceberg: Add support for 'If Not Exists' and 'or Replace' for Create Branch

2024-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28274:
--
Labels: pull-request-available  (was: )

> Iceberg: Add support for 'If Not Exists' and 'or Replace' for Create Branch
> ---
>
> Key: HIVE-28274
> URL: https://issues.apache.org/jira/browse/HIVE-28274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Add support for 
> {noformat}
> -- CREATE audit-branch at current snapshot with default retention if it 
> doesn't exist.
> ALTER TABLE prod.db.sample CREATE BRANCH IF NOT EXISTS `audit-branch`
> -- CREATE audit-branch at current snapshot with default retention or REPLACE 
> it if it already exists.
> ALTER TABLE prod.db.sample CREATE OR REPLACE BRANCH `audit-branch`{noformat}
> Like Spark:
> https://iceberg.apache.org/docs/1.5.1/spark-ddl/#branching-and-tagging-ddl



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28274) Iceberg: Add support for 'If Not Exists' and 'or Replace' for Create Branch

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-28274:

Summary: Iceberg: Add support for 'If Not Exists' and 'or Replace' for 
Create Branch  (was: Iceberg: Add support for 'If Not Exists" and 'or Replace' 
for Create Branch)

> Iceberg: Add support for 'If Not Exists' and 'or Replace' for Create Branch
> ---
>
> Key: HIVE-28274
> URL: https://issues.apache.org/jira/browse/HIVE-28274
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> Add support for 
> {noformat}
> -- CREATE audit-branch at current snapshot with default retention if it 
> doesn't exist.
> ALTER TABLE prod.db.sample CREATE BRANCH IF NOT EXISTS `audit-branch`
> -- CREATE audit-branch at current snapshot with default retention or REPLACE 
> it if it already exists.
> ALTER TABLE prod.db.sample CREATE OR REPLACE BRANCH `audit-branch`{noformat}
> Like Spark:
> https://iceberg.apache.org/docs/1.5.1/spark-ddl/#branching-and-tagging-ddl



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28275) Iceberg: Add support for 'If Not Exists" and 'or Replace' for Create Tag

2024-05-22 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-28275:
---

 Summary: Iceberg: Add support for 'If Not Exists" and 'or Replace' 
for Create Tag 
 Key: HIVE-28275
 URL: https://issues.apache.org/jira/browse/HIVE-28275
 Project: Hive
  Issue Type: Sub-task
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add support for If not exists and Or Replace while creating Tags
{noformat}
-- CREATE historical-tag at current snapshot with default retention if it 
doesn't exist.
ALTER TABLE prod.db.sample CREATE TAG IF NOT EXISTS `historical-tag`

-- CREATE historical-tag at current snapshot with default retention or REPLACE 
it if it already exists.
ALTER TABLE prod.db.sample CREATE OR REPLACE TAG `historical-tag`{noformat}
Like Spark:

https://iceberg.apache.org/docs/1.5.1/spark-ddl/#alter-table-create-branch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28274) Iceberg: Add support for 'If Not Exists" and 'or Replace' for Create Branch

2024-05-22 Thread Ayush Saxena (Jira)
Ayush Saxena created HIVE-28274:
---

 Summary: Iceberg: Add support for 'If Not Exists" and 'or Replace' 
for Create Branch
 Key: HIVE-28274
 URL: https://issues.apache.org/jira/browse/HIVE-28274
 Project: Hive
  Issue Type: Sub-task
Reporter: Ayush Saxena
Assignee: Ayush Saxena


Add support for 
{noformat}
-- CREATE audit-branch at current snapshot with default retention if it doesn't 
exist.
ALTER TABLE prod.db.sample CREATE BRANCH IF NOT EXISTS `audit-branch`

-- CREATE audit-branch at current snapshot with default retention or REPLACE it 
if it already exists.
ALTER TABLE prod.db.sample CREATE OR REPLACE BRANCH `audit-branch`{noformat}
Like Spark:

https://iceberg.apache.org/docs/1.5.1/spark-ddl/#branching-and-tagging-ddl



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-28273) Test data generation failure in HIVE-28249 related tests

2024-05-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-28273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-28273 started by Csaba Juhász.
---
> Test data generation failure in HIVE-28249 related tests
> 
>
> Key: HIVE-28273
> URL: https://issues.apache.org/jira/browse/HIVE-28273
> Project: Hive
>  Issue Type: Bug
>Reporter: Csaba Juhász
>Assignee: Csaba Juhász
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-22-19-11-35-890.png
>
>
> generateJulianLeapYearTimestamps and generateJulianLeapYearTimestamps28thFeb 
> are throwing NegativeArraySizeException once the base value equals or is over 
> 999
> This is caused by the below code, supplying a negative value (when digits 
> return a value larger than 4) to zeros, which in turn is used to create a new 
> char array.
> {code:java}
> StringBuilder sb = new StringBuilder(29);
> int year = ((i % ) + 1) * 100;
> sb.append(zeros(4 - digits(year)));
> {code}
> When the tests are run using maven, the error in the generation function is 
> caught but never rethrown or reported and  the build is reported successful. 
> For example running
> _TestParquetTimestampsHive2Compatibility#testWriteHive2ReadHive4UsingLegacyConversionWithJulianLeapYearsFor28thFeb_
>  has the result:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running 
> org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
> [INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 0.723 s - in 
> org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0
> ...
> [INFO] BUILD SUCCESS
> {code}
> When the test is run through an IDE (eg VSCode), the failure is reported 
> properly.
>  !image-2024-05-22-19-11-35-890.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28273) Test data generation failure in HIVE-28249 related tests

2024-05-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-28273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Csaba Juhász reassigned HIVE-28273:
---

Assignee: Csaba Juhász

> Test data generation failure in HIVE-28249 related tests
> 
>
> Key: HIVE-28273
> URL: https://issues.apache.org/jira/browse/HIVE-28273
> Project: Hive
>  Issue Type: Bug
>Reporter: Csaba Juhász
>Assignee: Csaba Juhász
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-22-19-11-35-890.png
>
>
> generateJulianLeapYearTimestamps and generateJulianLeapYearTimestamps28thFeb 
> are throwing NegativeArraySizeException once the base value equals or is over 
> 999
> This is caused by the below code, supplying a negative value (when digits 
> return a value larger than 4) to zeros, which in turn is used to create a new 
> char array.
> {code:java}
> StringBuilder sb = new StringBuilder(29);
> int year = ((i % ) + 1) * 100;
> sb.append(zeros(4 - digits(year)));
> {code}
> When the tests are run using maven, the error in the generation function is 
> caught but never rethrown or reported and  the build is reported successful. 
> For example running
> _TestParquetTimestampsHive2Compatibility#testWriteHive2ReadHive4UsingLegacyConversionWithJulianLeapYearsFor28thFeb_
>  has the result:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running 
> org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
> [INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 0.723 s - in 
> org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0
> ...
> [INFO] BUILD SUCCESS
> {code}
> When the test is run through an IDE (eg VSCode), the failure is reported 
> properly.
>  !image-2024-05-22-19-11-35-890.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28273) Test data generation failure in HIVE-28249 related tests

2024-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28273:
--
Labels: pull-request-available  (was: )

> Test data generation failure in HIVE-28249 related tests
> 
>
> Key: HIVE-28273
> URL: https://issues.apache.org/jira/browse/HIVE-28273
> Project: Hive
>  Issue Type: Bug
>Reporter: Csaba Juhász
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-22-19-11-35-890.png
>
>
> generateJulianLeapYearTimestamps and generateJulianLeapYearTimestamps28thFeb 
> are throwing NegativeArraySizeException once the base value equals or is over 
> 999
> This is caused by the below code, supplying a negative value (when digits 
> return a value larger than 4) to zeros, which in turn is used to create a new 
> char array.
> {code:java}
> StringBuilder sb = new StringBuilder(29);
> int year = ((i % ) + 1) * 100;
> sb.append(zeros(4 - digits(year)));
> {code}
> When the tests are run using maven, the error in the generation function is 
> caught but never rethrown or reported and  the build is reported successful. 
> For example running
> _TestParquetTimestampsHive2Compatibility#testWriteHive2ReadHive4UsingLegacyConversionWithJulianLeapYearsFor28thFeb_
>  has the result:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running 
> org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
> [INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 0.723 s - in 
> org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0
> ...
> [INFO] BUILD SUCCESS
> {code}
> When the test is run through an IDE (eg VSCode), the failure is reported 
> properly.
>  !image-2024-05-22-19-11-35-890.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28270) Fix missing partition paths bug on drop_database

2024-05-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848691#comment-17848691
 ] 

Ayush Saxena commented on HIVE-28270:
-

Committed to master.

Thanx [~wechar] for the contribution!!!

> Fix missing partition paths  bug on drop_database
> -
>
> Key: HIVE-28270
> URL: https://issues.apache.org/jira/browse/HIVE-28270
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> In {{HMSHandler#drop_database_core}}, it needs to collect all partition paths 
> that were not in the subdirectory of the table path, but now it only fetch 
> the last batch of paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28270) Fix missing partition paths bug on drop_database

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-28270:

Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> Fix missing partition paths  bug on drop_database
> -
>
> Key: HIVE-28270
> URL: https://issues.apache.org/jira/browse/HIVE-28270
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> In {{HMSHandler#drop_database_core}}, it needs to collect all partition paths 
> that were not in the subdirectory of the table path, but now it only fetch 
> the last batch of paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28270) Fix missing partition paths bug on drop_database

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-28270.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

> Fix missing partition paths  bug on drop_database
> -
>
> Key: HIVE-28270
> URL: https://issues.apache.org/jira/browse/HIVE-28270
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wechar
>Assignee: Wechar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In {{HMSHandler#drop_database_core}}, it needs to collect all partition paths 
> that were not in the subdirectory of the table path, but now it only fetch 
> the last batch of paths.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28271) DirectSql fails for AlterPartitions

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-28271:

Labels: hive-4.0.1-must pull-request-available  (was: 
pull-request-available)

> DirectSql fails for AlterPartitions
> ---
>
> Key: HIVE-28271
> URL: https://issues.apache.org/jira/browse/HIVE-28271
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: hive-4.0.1-must, pull-request-available
> Fix For: 4.1.0
>
>
> It fails at three places: (Misses Database Which Uses CLOB & Missing Boolean 
> type conversions Checks
> *First:*
> {noformat}
> 2024-05-21T08:50:16,570  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.getParams(DirectSqlUpdatePart.java:748)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateParamTableInBatch(DirectSqlUpdatePart.java:715)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:636)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}
> *Second:*
> {noformat}
> 2024-05-21T09:14:36,808  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateCDInBatch(DirectSqlUpdatePart.java:1228)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:888)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);{noformat}
> *Third: Missing Boolean check type*
> {noformat}
> 2024-05-21T09:35:44,063  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.sql.BatchUpdateException: A truncation error was encountered trying to 
> shrink CHAR 'false' to length 1. at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.lambda$updateSDInBatch$16(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateWithStatement(DirectSqlUpdatePart.java:656)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateSDInBatch(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:900)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28271) DirectSql fails for AlterPartitions

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-28271.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

> DirectSql fails for AlterPartitions
> ---
>
> Key: HIVE-28271
> URL: https://issues.apache.org/jira/browse/HIVE-28271
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> It fails at three places: (Misses Database Which Uses CLOB & Missing Boolean 
> type conversions Checks
> *First:*
> {noformat}
> 2024-05-21T08:50:16,570  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.getParams(DirectSqlUpdatePart.java:748)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateParamTableInBatch(DirectSqlUpdatePart.java:715)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:636)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}
> *Second:*
> {noformat}
> 2024-05-21T09:14:36,808  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateCDInBatch(DirectSqlUpdatePart.java:1228)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:888)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);{noformat}
> *Third: Missing Boolean check type*
> {noformat}
> 2024-05-21T09:35:44,063  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.sql.BatchUpdateException: A truncation error was encountered trying to 
> shrink CHAR 'false' to length 1. at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.lambda$updateSDInBatch$16(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateWithStatement(DirectSqlUpdatePart.java:656)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateSDInBatch(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:900)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28271) DirectSql fails for AlterPartitions

2024-05-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848690#comment-17848690
 ] 

Ayush Saxena commented on HIVE-28271:
-

Committed to master.

Thanx [~zhangbutao] & [~wechar] for the review!!

> DirectSql fails for AlterPartitions
> ---
>
> Key: HIVE-28271
> URL: https://issues.apache.org/jira/browse/HIVE-28271
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>
> It fails at three places: (Misses Database Which Uses CLOB & Missing Boolean 
> type conversions Checks
> *First:*
> {noformat}
> 2024-05-21T08:50:16,570  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.getParams(DirectSqlUpdatePart.java:748)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateParamTableInBatch(DirectSqlUpdatePart.java:715)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:636)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}
> *Second:*
> {noformat}
> 2024-05-21T09:14:36,808  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.lang.ClassCastException: org.apache.derby.impl.jdbc.EmbedClob cannot be 
> cast to java.lang.String at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateCDInBatch(DirectSqlUpdatePart.java:1228)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:888)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);{noformat}
> *Third: Missing Boolean check type*
> {noformat}
> 2024-05-21T09:35:44,063  WARN [main] metastore.ObjectStore: Falling back to 
> ORM path due to direct SQL failure (this is not an error): 
> java.sql.BatchUpdateException: A truncation error was encountered trying to 
> shrink CHAR 'false' to length 1. at 
> org.apache.hadoop.hive.metastore.ExceptionHandler.newMetaException(ExceptionHandler.java:152)
>  at org.apache.hadoop.hive.metastore.Batchable.runBatched(Batchable.java:92) 
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.lambda$updateSDInBatch$16(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateWithStatement(DirectSqlUpdatePart.java:656)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateSDInBatch(DirectSqlUpdatePart.java:926)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.updateStorageDescriptorInBatch(DirectSqlUpdatePart.java:900)
>  at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdatePart.alterPartitions(DirectSqlUpdatePart.java:638)
>  at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.alterPartitions(MetaStoreDirectSql.java:599)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore$20.getSqlResult(ObjectStore.java:5371);
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28273) Test data generation failure in HIVE-28249 related tests

2024-05-22 Thread Jira
Csaba Juhász created HIVE-28273:
---

 Summary: Test data generation failure in HIVE-28249 related tests
 Key: HIVE-28273
 URL: https://issues.apache.org/jira/browse/HIVE-28273
 Project: Hive
  Issue Type: Bug
Reporter: Csaba Juhász
 Attachments: image-2024-05-22-19-11-35-890.png

generateJulianLeapYearTimestamps and generateJulianLeapYearTimestamps28thFeb 
are throwing NegativeArraySizeException once the base value equals or is over 
999

This is caused by the below code, supplying a negative value (when digits 
return a value larger than 4) to zeros, which in turn is used to create a new 
char array.

{code:java}
StringBuilder sb = new StringBuilder(29);
int year = ((i % ) + 1) * 100;
sb.append(zeros(4 - digits(year)));
{code}

When the tests are run using maven, the error in the generation function is 
caught but never rethrown or reported and  the build is reported successful. 
For example running
_TestParquetTimestampsHive2Compatibility#testWriteHive2ReadHive4UsingLegacyConversionWithJulianLeapYearsFor28thFeb_
 has the result:


{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
[INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.723 
s - in 
org.apache.hadoop.hive.ql.io.parquet.serde.TestParquetTimestampsHive2Compatibility
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 396, Failures: 0, Errors: 0, Skipped: 0

...

[INFO] BUILD SUCCESS
{code}

When the test is run through an IDE (eg VSCode), the failure is reported 
properly.

 !image-2024-05-22-19-11-35-890.png! 




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-28246) Fix confusing log message in LlapTaskSchedulerService

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-28246.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

> Fix confusing log message in LlapTaskSchedulerService
> -
>
> Key: HIVE-28246
> URL: https://issues.apache.org/jira/browse/HIVE-28246
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 4.1.0
>
>
> https://github.com/apache/hive/blob/8415527101432bb5bf14b3c2a318a2cc40801b9a/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L1719
> {code}
>   WM_LOG.info("Registering " + taskInfo.attemptId + "; " + 
> taskInfo.isGuaranteed);
> {code}
> leads to a message like:
> {code}
> Registering attempt_1714730410273_0009_153_05_000235_10; false
> {code}
> "false" is out of any context, supposed to be something like:
> {code}
> Registering attempt_1714730410273_0009_153_05_000235_10, guaranteed: false
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28246) Fix confusing log message in LlapTaskSchedulerService

2024-05-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848645#comment-17848645
 ] 

Ayush Saxena commented on HIVE-28246:
-

Committed to master.

Thanx [~zratkai] for the contribution & [~aturoczy] for the review!!!

> Fix confusing log message in LlapTaskSchedulerService
> -
>
> Key: HIVE-28246
> URL: https://issues.apache.org/jira/browse/HIVE-28246
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: newbie, pull-request-available
>
> https://github.com/apache/hive/blob/8415527101432bb5bf14b3c2a318a2cc40801b9a/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L1719
> {code}
>   WM_LOG.info("Registering " + taskInfo.attemptId + "; " + 
> taskInfo.isGuaranteed);
> {code}
> leads to a message like:
> {code}
> Registering attempt_1714730410273_0009_153_05_000235_10; false
> {code}
> "false" is out of any context, supposed to be something like:
> {code}
> Registering attempt_1714730410273_0009_153_05_000235_10, guaranteed: false
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28246) Fix confusing log message in LlapTaskSchedulerService

2024-05-22 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-28246:

Summary: Fix confusing log message in LlapTaskSchedulerService  (was: 
Confusing log messages in LlapTaskScheduler)

> Fix confusing log message in LlapTaskSchedulerService
> -
>
> Key: HIVE-28246
> URL: https://issues.apache.org/jira/browse/HIVE-28246
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: Zoltán Rátkai
>Priority: Major
>  Labels: newbie, pull-request-available
>
> https://github.com/apache/hive/blob/8415527101432bb5bf14b3c2a318a2cc40801b9a/llap-tez/src/java/org/apache/hadoop/hive/llap/tezplugins/LlapTaskSchedulerService.java#L1719
> {code}
>   WM_LOG.info("Registering " + taskInfo.attemptId + "; " + 
> taskInfo.isGuaranteed);
> {code}
> leads to a message like:
> {code}
> Registering attempt_1714730410273_0009_153_05_000235_10; false
> {code}
> "false" is out of any context, supposed to be something like:
> {code}
> Registering attempt_1714730410273_0009_153_05_000235_10, guaranteed: false
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-25974) Drop HiveFilterMergeRule and use FilterMergeRule from Calcite

2024-05-22 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-25974.

Fix Version/s: Not Applicable
   Resolution: Duplicate

> Drop HiveFilterMergeRule and use FilterMergeRule from Calcite
> -
>
> Key: HIVE-25974
> URL: https://issues.apache.org/jira/browse/HIVE-25974
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 4.0.0
>Reporter: Alessandro Solimando
>Priority: Major
> Fix For: Not Applicable
>
>
> HiveFilterMergeRule is a copy of FilterMergeRule which was needed since the 
> latter did not simplify/flatten before creating the merged filter.
> This behaviour has been fixed in CALCITE-3982 (released since 1.23), so it 
> seems that the Hive rule could be removed now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-22633) GroupByOperator may throw NullPointerException when setting data skew optimization parameters

2024-05-22 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848545#comment-17848545
 ] 

Butao Zhang edited comment on HIVE-22633 at 5/22/24 10:46 AM:
--

Update: if you are using Hive3, you can try to patch this HIVE-27712 as a 
hotfix, which is easier to test than HIVE-23530.

Hive4 does not have this issue as we don't use this udaf since HIVE-23530 .


was (Author: zhangbutao):
Update: if you are using Hive3, you can try to patch this HIVE-27712 as a 
hotfix, which is easier to test than HIVE-23530.

> GroupByOperator may throw NullPointerException when setting data skew 
> optimization parameters
> -
>
> Key: HIVE-22633
> URL: https://issues.apache.org/jira/browse/HIVE-22633
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1, 4.0.0
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>
> if hive.map.aggr and hive.groupby.skewindata set true,exception will be 
> thrown.
> step to repro:
> 1. create table: 
> set hive.map.aggr=true;
> set hive.groupby.skewindata=true;
> create table test1 (id1 bigint);
> create table test2 (id2 bigint) partitioned by(dt2 string);
> insert into test2 partition(dt2='2020') select a.id1 from test1 a group by 
> a.id1;
> 2.NullPointerException:
> {code:java}
> ], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1585641455670_0001_2_03_00_2:java.lang.RuntimeException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFNumericStatsEvaluator.init(GenericUDAFComputeStats.java:373)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:373)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:191)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-22633) GroupByOperator may throw NullPointerException when setting data skew optimization parameters

2024-05-22 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848545#comment-17848545
 ] 

Butao Zhang commented on HIVE-22633:


Update: if you are using Hive3, you can try to patch this HIVE-27712 as a 
hotfix, which is easier to test than HIVE-23530.

> GroupByOperator may throw NullPointerException when setting data skew 
> optimization parameters
> -
>
> Key: HIVE-22633
> URL: https://issues.apache.org/jira/browse/HIVE-22633
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1, 4.0.0
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>
> if hive.map.aggr and hive.groupby.skewindata set true,exception will be 
> thrown.
> step to repro:
> 1. create table: 
> set hive.map.aggr=true;
> set hive.groupby.skewindata=true;
> create table test1 (id1 bigint);
> create table test2 (id2 bigint) partitioned by(dt2 string);
> insert into test2 partition(dt2='2020') select a.id1 from test1 a group by 
> a.id1;
> 2.NullPointerException:
> {code:java}
> ], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1585641455670_0001_2_03_00_2:java.lang.RuntimeException: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFComputeStats$GenericUDAFNumericStatsEvaluator.init(GenericUDAFComputeStats.java:373)
> at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:373)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)
> at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:191)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28272) Support setting per-session S3 credentials in Warehouse

2024-05-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-28272:
--
Labels: pull-request-available  (was: )

> Support setting per-session S3 credentials in Warehouse
> ---
>
> Key: HIVE-28272
> URL: https://issues.apache.org/jira/browse/HIVE-28272
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28272) Support setting per-session S3 credentials in Warehouse

2024-05-22 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang reassigned HIVE-28272:
--

Assignee: Butao Zhang

> Support setting per-session S3 credentials in Warehouse
> ---
>
> Key: HIVE-28272
> URL: https://issues.apache.org/jira/browse/HIVE-28272
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-28268) Iceberg: Retrieve row count from iceberg SnapshotSummary in case of iceberg.hive.keep.stats=false

2024-05-22 Thread Butao Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang reassigned HIVE-28268:
--

Assignee: Butao Zhang

> Iceberg: Retrieve row count from iceberg SnapshotSummary in case of 
> iceberg.hive.keep.stats=false
> -
>
> Key: HIVE-28268
> URL: https://issues.apache.org/jira/browse/HIVE-28268
> Project: Hive
>  Issue Type: Task
>  Components: Iceberg integration
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-28272) Support setting per-session S3 credentials in Warehouse

2024-05-22 Thread Butao Zhang (Jira)
Butao Zhang created HIVE-28272:
--

 Summary: Support setting per-session S3 credentials in Warehouse
 Key: HIVE-28272
 URL: https://issues.apache.org/jira/browse/HIVE-28272
 Project: Hive
  Issue Type: Improvement
Reporter: Butao Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null

2024-05-22 Thread Dayakar M (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848524#comment-17848524
 ] 

Dayakar M commented on HIVE-25351:
--

[~yangjiandan] currently I am not working on this issue, if you have a solution 
ready then you can take it over and fix it. Thanks.

> stddev(), stddev_pop() with CBO enable returning null
> -
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Dayakar M
>Priority: Blocker
>  Labels: pull-request-available
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   Select Operator [SEL_7] (rows=6 width=232) |
> | Output:["v1","v2","v3"]|
> | TableScan [TS_0] (rows=6 width=232) |
> |

[jira] [Resolved] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-05-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-28266.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-05-22 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-28266:
--
Affects Version/s: 4.0.0

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-28266) Iceberg: select count(*) from data_files metadata tables gives wrong result

2024-05-22 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848501#comment-17848501
 ] 

Denys Kuzmenko commented on HIVE-28266:
---

Merged to master
Thanks [~difin] for the patch and [~zhangbutao] for the review!

> Iceberg: select count(*) from data_files metadata tables gives wrong result
> ---
>
> Key: HIVE-28266
> URL: https://issues.apache.org/jira/browse/HIVE-28266
> Project: Hive
>  Issue Type: Bug
>Reporter: Dmitriy Fingerman
>Assignee: Dmitriy Fingerman
>Priority: Major
>  Labels: pull-request-available
>
> In Hive Iceberg, every table has a corresponding metadata table 
> "*.data_files" that contains info about the files that contain table's data.
> select count(*) from a data_file metadata table returns number of rows in the 
> data table instead of number of data files from the metadata table.
>  
> {code:java}
> CREATE TABLE x (name VARCHAR(50), age TINYINT, num_clicks BIGINT) stored by 
> iceberg stored as orc TBLPROPERTIES 
> ('external.table.purge'='true','format-version'='2');
> insert into x values 
> ('amy', 35, 123412344),
> ('adxfvy', 36, 123412534),
> ('amsdfyy', 37, 123417234),
> ('asafmy', 38, 123412534);
> insert into x values 
> ('amerqwy', 39, 123441234),
> ('amyxzcv', 40, 123341234),
> ('erweramy', 45, 122341234);
> Select * from default.x.data_files;
> – Returns 2 records in the output
> Select count from default.x.data_files;
> – Returns 7 instead of 2
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25351) stddev(), stddev_pop() with CBO enable returning null

2024-05-22 Thread Jiandan Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848487#comment-17848487
 ] 

Jiandan Yang  commented on HIVE-25351:
--

[~Dayakar] I encountered the same issue in Hive version 3.1.3, and from 
reviewing the code, it appears that the current master branch would have the 
same issue. I have fixed this problem in version 3.1.3. If no one is addressing 
this issue, I am prepared to take it over and resolve it.

> stddev(), stddev_pop() with CBO enable returning null
> -
>
> Key: HIVE-25351
> URL: https://issues.apache.org/jira/browse/HIVE-25351
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Dayakar M
>Priority: Blocker
>  Labels: pull-request-available
>
> *script used to repro*
> create table cbo_test (key string, v1 double, v2 decimal(30,2), v3 
> decimal(30,2));
> insert into cbo_test values ("00140006375905", 10230.72, 
> 10230.72, 10230.69), ("00140006375905", 10230.72, 10230.72, 
> 10230.69), ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69), 
> ("00140006375905", 10230.72, 10230.72, 10230.69);
> select stddev(v1), stddev(v2), stddev(v3) from cbo_test;
> *Enable CBO*
> ++
> |  Explain   |
> ++
> | Plan optimized by CBO. |
> ||
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_13] |
> | Select Operator [SEL_12] (rows=1 width=24) |
> |   Output:["_col0","_col1","_col2"] |
> |   Group By Operator [GBY_11] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(VALUE._col0)","sum(VALUE._col1)","count(VALUE._col2)","sum(VALUE._col3)","sum(VALUE._col4)","count(VALUE._col5)","sum(VALUE._col6)","sum(VALUE._col7)","count(VALUE._col8)"]
>  |
> |   <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized  |
> | PARTITION_ONLY_SHUFFLE [RS_10] |
> |   Group By Operator [GBY_9] (rows=1 width=72) |
> | 
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"],aggregations:["sum(_col3)","sum(_col0)","count(_col0)","sum(_col5)","sum(_col4)","count(_col1)","sum(_col7)","sum(_col6)","count(_col2)"]
>  |
> | Select Operator [SEL_8] (rows=6 width=232) |
> |   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7"] |
> |   TableScan [TS_0] (rows=6 width=232) |
> | default@cbo_test,cbo_test, ACID 
> table,Tbl:COMPLETE,Col:COMPLETE,Output:["v1","v2","v3"] |
> ||
> ++
> *Query Result* 
> _c0   _c1 _c2
> 0.0   NaN NaN
> *Disable CBO*
> ++
> |  Explain   |
> ++
> | Vertex dependency in root stage|
> | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)|
> ||
> | Stage-0|
> |   Fetch Operator   |
> | limit:-1   |
> | Stage-1|
> |   Reducer 2 vectorized |
> |   File Output Operator [FS_11] |
> | Group By Operator [GBY_10] (rows=1 width=24) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(VALUE._col0)","stddev(VALUE._col1)","stddev(VALUE._col2)"]
>  |
> | <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized|
> |   PARTITION_ONLY_SHUFFLE [RS_9]|
> | Group By Operator [GBY_8] (rows=1 width=240) |
> |   
> Output:["_col0","_col1","_col2"],aggregations:["stddev(v1)","stddev(v2)","stddev(v3)"]
>  |
> |   

[jira] [Commented] (HIVE-28258) Use Iceberg semantics for Merge task

2024-05-22 Thread Sourabh Badhya (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-28258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848458#comment-17848458
 ] 

Sourabh Badhya commented on HIVE-28258:
---

[~kkasa] , the following task mainly tries to reuse the existing Iceberg 
readers (IcebergRecordReader) rather than using the file-format readers 
according to the table format. This way we can use the existing code for 
handling different file formats (ORC, Parquet, Avro) within Iceberg and avoid 
writing any custom implementations to handle these file-formats.

Additionally, this will help in handling different schemas that Iceberg 
maintains (the data schema and the delete schema) within Iceberg, and not 
expose it through public APIs.

Custom hacks like changing the file format of the merge task is also removed 
which was done earlier.

The existing tests iceberg_merge_files.q should serve as an example for 
debugging the merge task used for Iceberg.

> Use Iceberg semantics for Merge task
> 
>
> Key: HIVE-28258
> URL: https://issues.apache.org/jira/browse/HIVE-28258
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>
> Use Iceberg semantics for Merge task, instead of normal ORC or parquet 
> readers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)