date:20240221

[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2024-02-21 Thread Krisztian Kasa (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-27924:
--
Fix Version/s: 4.1.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.1.0-must, known_issue, 
> materializedviews, pull-request-available
> Fix For: 4.1.0
>
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based query rewrite

[jira] [Commented] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables

2024-02-21 Thread Krisztian Kasa (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819506#comment-17819506
 ] 

Krisztian Kasa commented on HIVE-27924:
---

Merged to master. Thanks [~dkuzmenko] for review the patch and [~wenhaoli] for 
the detailed repro steps.

> Incremental rebuild goes wrong when inserts and deletes overlap between the 
> source tables
> -
>
> Key: HIVE-27924
> URL: https://issues.apache.org/jira/browse/HIVE-27924
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 4.0.0-beta-1
> Environment: * Docker version : 19.03.6
>  * Hive version : 4.0.0-beta-1
>  * Driver version : Hive JDBC (4.0.0-beta-1)
>  * Beeline version : 4.0.0-beta-1
>Reporter: Wenhao Li
>Assignee: Krisztian Kasa
>Priority: Critical
>  Labels: bug, hive, hive-4.1.0-must, known_issue, 
> materializedviews, pull-request-available
> Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, 
> 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG
>
>
> h1. Summary
> The incremental rebuild plan and execution output are incorrect when one side 
> of the table join has inserted/deleted join keys that the other side has 
> deleted/inserted (note the order).
> The argument is that tuples that have never been present simultaneously 
> should not interact with one another, i.e., one's inserts should not join the 
> other's deletes.
> h1. Related Test Case
> The bug was discovered during replication of the test case:
> ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q??
> h1. Steps to Reproduce the Issue
>  # Configurations:
> {code:sql}
> SET hive.vectorized.execution.enabled=false;
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.strict.checks.cartesian.product=false;
> set hive.materializedview.rewriting=true;{code}
>  # 
> {code:sql}
> create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
> stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'alfred', 10.30, 2),
> (1, 'charlie', 20.30, 2); {code}
>  # 
> {code:sql}
> create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
> int) stored as orc TBLPROPERTIES ('transactional'='true'); {code}
>  # 
> {code:sql}
> insert into cmv_basetable_2_n3 values
> (1, 'bob', 30.30, 2),
> (1, 'bonnie', 40.30, 2);{code}
>  # 
> {code:sql}
> CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES 
> ('transactional'='true') AS
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0;{code}
>  # 
> {code:sql}
> show tables; {code}
> !截图.PNG!
>  # Select tuples, including deletion and with VirtualColumn's, from the MV 
> and source tables. We see that the MV is correctly built upon creation:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图1.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图2.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图3.PNG!
>  # Now make an insert to the LHS and a delete to the RHS source table:
> {code:sql}
> insert into cmv_basetable_n6 values
> (1, 'kevin', 50.30, 2);
> DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code}
>  # Select again to see what happened:
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code}
> !截图4.PNG!
>  # 
> {code:sql}
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code}
> !截图5.PNG!
>  # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, 
> which is incorrect already:
> {code:sql}
> EXPLAIN CBO
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code}
> !截图6.PNG!
>  # Rebuild MV and see (incorrect) results:
> {code:sql}
> ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
> SELECT ROW__IS__DELETED, ROW__ID, * FROM 
> cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code}
> !截图7.PNG!
>  # Run MV definition directly, which outputs incorrect results because the MV 
> is enabled for MV-based query rewrite, i.e., the following query will output 
> what's in the MV for the time being:
> {code:sql}
> SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
> FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
> cmv_basetable_2_n3.a)
> WHERE cmv_basetable_2_n3.c > 10.0; {code}
> !截图8.PNG!
>  # Disable MV-based

[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819493#comment-17819493
 ] 

yongzhi.shao commented on HIVE-27898:
-

testdb.test_data_02 is iceberg table?

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread yongzhi.shao (Jira)



[ https://issues.apache.org/jira/browse/HIVE-27898 ]


yongzhi.shao deleted comment on HIVE-27898:
-

was (Author: lisoda):
testdb.test_data_02 is iceberg table?

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread Butao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819491#comment-17819491
 ] 

Butao Zhang commented on HIVE-27898:


*After disable cbo, the query is also fine.*

 

// set conf in beeline:

set hive.fetch.task.conversion=none;  //enforce launching tez task
set hive.explain.user=false;

set hive.cbo.enable=false;

 
{code:java}
0: jdbc:hive2://127.0.0.1:1/default> show create table testdb.test_data_02;
INFO  : Compiling 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e): show 
create table  testdb.test_data_02
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from 
deserializer)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e); Time 
taken: 0.045 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e): show 
create table  testdb.test_data_02
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e); Time 
taken: 0.057 seconds
++
|                   createtab_stmt                   |
++
| CREATE EXTERNAL TABLE `testdb`.`test_data_02`(     |
|   `id` string,                                     |
|   `name` string)                                   |
| PARTITIONED BY SPEC (                              |
| `name`)                                            |
| ROW FORMAT SERDE                                   |
|   'org.apache.iceberg.mr.hive.HiveIcebergSerDe'    |
| STORED BY                                          |
|   'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'  |
| WITH SERDEPROPERTIES (                             |
|   'serialization.format'='1')                      |
| LOCATION                                           |
|   'hdfs://127.0.0.1:8028/tmp/testiceberg/default/test_data_02' |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'current-snapshot-id'='6130266132019241842',     |
|   'engine.hive.enabled'='true',                    |
|   'format-version'='2',                            |
|   'iceberg.catalog'='location_based_table',        |
|   'table_type'='ICEBERG',                          |
|   'transient_lastDdlTime'='1708508832')            |
++
21 rows selected (0.482 seconds)
0: jdbc:hive2://127.0.0.1:1/default>
0: jdbc:hive2://127.0.0.1:1/default>
0: jdbc:hive2://127.0.0.1:1/default> explain select * from (select *  from 
testdb.test_data_02 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97): 
explain select * from (select *  from testdb.test_data_02 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97); Time 
taken: 0.09 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97): 
explain select * from (select *  from testdb.test_data_02 limit 10) s1
INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97); Time 
taken: 0.008 seconds
++
|                      Explain                       |
++
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Tez                                            |
|       DagId: hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97:18 |
|       Edges:                                       |
|         Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)    |
|       DagName: hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97:18 |
|       Vertices:                                    |
|         Map 1                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: test_data_02              |
|

[jira] [Updated] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread yongzhi.shao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Affects Version/s: 4.0.0

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread yongzhi.shao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Issue Type: Bug  (was: Improvement)

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread yongzhi.shao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Summary: When CBO is disabled, HIVE cannot query ICEBERG tables in nested 
subqueries properly  (was: 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表.)

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819479#comment-17819479
 ] 

yongzhi.shao commented on HIVE-27898:
-

[~zhangbutao] :

Hi, I found the problem, can you close cbo and try again?

set hive.cbo.enable=false;

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27898) 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表.

2024-02-21 Thread yongzhi.shao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Summary: 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表.  (was: HIVE4 can't use ICEBERG 
table in subqueries)

> 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表.
> --
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Attachment: hive-site.xml

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yongzhi.shao updated HIVE-27898:

Attachment: (was: hive-site.xml)

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:04 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.

Please note that in the statistics of Tez, the anomaly in the statistics of the 
Submit Plan metric has been fixed. This proves that I am using the latest 
version of the code.

Also, I noticed in your previous test that one of the tables is not an ICEBERG 
table.(testdb.test_data_02)

This SQL executes normal internal tables without problems. It is the ICEBERG 
table that has problems executing.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:03 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.

Please note that in the statistics of Tez, the anomaly in the statistics of the 
Submit Plan metric has been fixed. This proves that I am using the latest 
version of the code.

Also, I noticed in your previous test that one of the tables is not an ICEBERG 
table. (
testdb.test_data_02
) This SQL executes normal internal tables without problems. It is the ICEBERG 
table that has problems executing.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:02 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.

Please note that in the statistics of Tez, the anomaly in the statistics of the 
Submit Plan metric has been fixed. This proves that I am using the latest 
version of the code.

Also, I noticed in your previous test that one of the tables is not an ICEBERG 
table. This SQL executes normal internal tables without problems. It is the 
ICEBERG table that has problems executing.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:55 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.

Please note that in the statistics of Tez, the anomaly in the statistics of the 
Submit Plan metric has been fixed. This proves that I am using the latest 
version of the code.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1              0.00         10,290            290

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:51 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.

Please note that in the statistics of Tez, the anomaly in the statistics of the 
Submit Plan metric has been fixed. This proves that I am using the latest 
version of the code.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1              0.00         10,290            290

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:50 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1              0.00         10,290            290              
 2                2
INFO  :  Reducer 2              0.00          1,020              0              
 2                0
INFO  :

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:49 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1              0.00         10,290            290              
 2                2
INFO  :  Reducer 2              0.00          1,020              0              
 2                0
INFO  :

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:49 AM:
--

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1              0.00         10,290            290              
 2                2
INFO  :  Reducer 2              0.00          1,020              0              
 2                0
INFO  :

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475
 ] 

yongzhi.shao commented on HIVE-27898:
-

[~zhangbutao] 

I tried the master branch, using the latest updated code today, and the problem 
persists.
{code:java}
0: jdbc:hive2://smaster01:1> select version();
INFO  : Compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 1.974 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): 
select version()
INFO  : Completed executing 
command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time 
taken: 0.004 seconds
_c0  4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d




0: jdbc:hive2://smaster01:1>  select * from (select * from 
iceberg_dwd.test_data_04 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, 
type:string, comment:null), FieldSchema(name:s1.name, type:string, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time 
taken: 1.007 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): 
select * from (select * from iceberg_dwd.test_data_04 limit 10) s1
INFO  : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: 
hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6
INFO  : Tez session hasn't been created yet. Opening session
INFO  : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1)
INFO  : HS2 Host: [smaster01], Query ID: 
[hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: 
[dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421]
INFO  : Status: Running (Executing on YARN cluster with App id 
application_1706163520799_39421)--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 20.10 s    
--
INFO  : Status: DAG finished successfully in 19.90 seconds
INFO  : DAG ID: dag_1706163520799_39421_1
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           1.01s
INFO  : Prepare Plan                           16.62s
INFO  : Get Query Coordinator (AM)              0.03s
INFO  : Submit Plan                             0.54s
INFO  : Start DAG                               0.07s
INFO  : Run DAG                                19.90s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1              0.00         10,290            290              
 2                2
INFO  :  Reducer 2              0.00          1,020              0              
 2                0
INFO  :

[jira] [Comment Edited] (HIVE-26435) Add method for collecting HMS meta summary

2024-02-21 Thread Zhihua Deng (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819470#comment-17819470
 ] 

Zhihua Deng edited comment on HIVE-26435 at 2/22/24 4:55 AM:
-

Fix has been merged into master, Thank you [~danielzhu] and [~ruyi.zheng] for 
the work! 


was (Author: dengzh):
Fix has been merged into master, Thank you [~danielzhu] for the work! 

> Add method for collecting HMS meta summary
> --
>
> Key: HIVE-26435
> URL: https://issues.apache.org/jira/browse/HIVE-26435
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ruyi Zheng
>Assignee: Hongdan Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently lacks visibility into its metadata. This work 
> includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to 
> print a summary (catalog name, database name, table name, partition column 
> count, number of rows. table type, file type, compression type, total data 
> size, etc). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-26435) Add method for collecting HMS meta summary

2024-02-21 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-26435.

Fix Version/s: 4.0.0
 Assignee: Hongdan Zhu  (was: Naveen Gangam)
   Resolution: Fixed

Fix has been merged into master, Thank you [~danielzhu] for the work! 

> Add method for collecting HMS meta summary
> --
>
> Key: HIVE-26435
> URL: https://issues.apache.org/jira/browse/HIVE-26435
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ruyi Zheng
>Assignee: Hongdan Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently lacks visibility into its metadata. This work 
> includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to 
> print a summary (catalog name, database name, table name, partition column 
> count, number of rows. table type, file type, compression type, total data 
> size, etc). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-26435) Add method for collecting HMS meta summary

2024-02-21 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-26435:
---
Summary: Add method for collecting HMS meta summary  (was: HMS Summary)

> Add method for collecting HMS meta summary
> --
>
> Key: HIVE-26435
> URL: https://issues.apache.org/jira/browse/HIVE-26435
> Project: Hive
>  Issue Type: New Feature
>Reporter: Ruyi Zheng
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Hive Metastore currently lacks visibility into its metadata. This work 
> includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to 
> print a summary (catalog name, database name, table name, partition column 
> count, number of rows. table type, file type, compression type, total data 
> size, etc). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-27778) Alter table command gives error after computer stats is run with Impala

2024-02-21 Thread Zhihua Deng (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng resolved HIVE-27778.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fix has been merged into master. Thank you [~dkuzmenko] and [~zhangbutao] for 
the review!

> Alter table command gives error after computer stats is run with Impala
> ---
>
> Key: HIVE-27778
> URL: https://issues.apache.org/jira/browse/HIVE-27778
> Project: Hive
>  Issue Type: Bug
>Reporter: Kokila N
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Hive partitioned table's column stats are stored on partition level which is 
> in PART_COL_STATS in sys.
> When "column stats " query is run in Impala on a Hive partitioned 
> table generates column stats on table level and is stored in TAB_COL_STATS.
> So, Executing "Alter rename  to " after impala compute 
> stats throws error.
> {code:java}
> ERROR : Failed
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. 
> Cannot change stats state for a transactional table default.parqtest without 
> providing the transactional write state for verification (new write ID 6, 
> valid write IDs null; current state null; new state {} {code}
> The column stats generated from impala needs to be deleted for alter command 
> to work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-28085) YarnQueueHelper fails to access HTTPS enabled YARN WebService

2024-02-21 Thread Prabhu Joseph (Jira)

Prabhu Joseph created HIVE-28085:


 Summary: YarnQueueHelper fails to access HTTPS enabled YARN 
WebService
 Key: HIVE-28085
 URL: https://issues.apache.org/jira/browse/HIVE-28085
 Project: Hive
  Issue Type: Bug
Affects Versions: 4.0.0-alpha-1
Reporter: Prabhu Joseph


YarnQueueHelper fails to access HTTPS enabled YARN WebService with below error.
{code:java}
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable 
to find valid certification path to requested target
at 
sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:148)
 ~[?:1.8.0_402]
at 
sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:129)
 ~[?:1.8.0_402]
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) 
~[?:1.8.0_402]
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:451) 
~[?:1.8.0_402]
at 
sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:323) 
~[?:1.8.0_402]
at sun.security.validator.Validator.validate(Validator.java:271) 
~[?:1.8.0_402]
at 
sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:315) 
~[?:1.8.0_402]
at 
sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:223)
 ~[?:1.8.0_402]
at 
sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129)
 ~[?:1.8.0_402]
at 
sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1340)
 ~[?:1.8.0_402]
at 
sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1231)
 ~[?:1.8.0_402]
at 
sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1174)
 ~[?:1.8.0_402]
at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377) 
~[?:1.8.0_402]
at 
sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444) 
~[?:1.8.0_402]
at 
sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422) 
~[?:1.8.0_402]
at 
sun.security.ssl.TransportContext.dispatch(TransportContext.java:182) 
~[?:1.8.0_402]
at sun.security.ssl.SSLTransport.decode(SSLTransport.java:152) 
~[?:1.8.0_402]
at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1401) 
~[?:1.8.0_402]
at 
sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1309) 
~[?:1.8.0_402]
at 
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440) 
~[?:1.8.0_402]
at 
sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) 
~[?:1.8.0_402]
at 
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:197)
 ~[?:1.8.0_402]
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572)
 ~[?:1.8.0_402]
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
 ~[?:1.8.0_402]
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) 
~[?:1.8.0_402]
at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352)
 ~[?:1.8.0_402]
at 
org.apache.hadoop.hive.ql.exec.tez.YarnQueueHelper.checkQueueAccessFromSingleRm(YarnQueueHelper.java:155)
 ~[hive-exec-3.1.3-amzn-8.jar:3.1.3-amzn-8] {code}
The fix (HIVE-23062) has corrected the URL but has not used the SSLFactory to 
setup the HTTPURlConnection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-22738) CVE-2019-0205

2024-02-21 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-22738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819320#comment-17819320
 ] 

Dongjoon Hyun commented on HIVE-22738:
--

For the record, THRIFT-4053 fixed Java implementation for CVE-2019-0205 at 
0.11.0.

> CVE-2019-0205
> -
>
> Key: HIVE-22738
> URL: https://issues.apache.org/jira/browse/HIVE-22738
> Project: Hive
>  Issue Type: Bug
>  Components: Thrift API
>Reporter: Laurent Goujon
>Priority: Major
>
> There's has been a CVE issued for a Thrift vulnerability which might impact 
> Hive. The CVE is 
> [CVE-2019-0205|https://nvd.nist.gov/vuln/detail/CVE-2019-0205], impacts both 
> clients and servers, and might cause a denial of service through an infinite 
> loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28064) Add cause to ParseException for diagnosability purposes

2024-02-21 Thread Stamatis Zampetakis (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis resolved HIVE-28064.

Fix Version/s: 4.1.0
   Resolution: Fixed

Fixed in 
https://github.com/apache/hive/commit/6e061e6559522c8a060c1b55439ada0001bf5e5d. 
Thanks for the reviews [~okumin] [~zhangbutao]!

> Add cause to ParseException for diagnosability purposes
> ---
>
> Key: HIVE-28064
> URL: https://issues.apache.org/jira/browse/HIVE-28064
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> The 
> [ParseException|https://github.com/apache/hive/blob/4b01a607091581ac9bdb372f8b47c1efca4d4bb4/parser/src/java/org/apache/hadoop/hive/ql/parse/ParseException.java]
>  contains high level information about problems encountered during parsing 
> but currently the stacktrace is pretty shallow. 
> {code:sql}
> select * from author where  true > fname in ('Alex','Rob') 
> {code}
> Currently the query above will fail with the above stacktrace:
> {noformat}
> org.apache.hadoop.hive.ql.parse.ParseException: line 4:41 cannot recognize 
> input near 'in' '(' ''Alex'' in expression specification
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:97)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:89)
> at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:172)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
> {noformat}
> The end-user gets a hint about what the error might be but the developer has 
> no way to tell how far we went into parsing the given statement and which 
> grammar rule failed to pass.
> In some cases we are catching the RecognitionException which shows exactly at 
> which point the parser failed to advance so we could pass it as a cause 
> inside the ParseException. With this change the stacktrace would be similar 
> to the one below:
> {noformat}
> org.apache.hadoop.hive.ql.parse.ParseException: line 4:41 cannot recognize 
> input near 'in' '(' ''Alex'' in expression specification
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:97)
> at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:89)
> at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:172)
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121)
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356)
> Caused by: NoViableAltException(173@[()* loopback of 739:5: (equal= 
> precedenceEqualOperator p= precedenceSimilarExpression -> ^( $equal $p) 
> |dist= precedenceDistinctOperator p=

[jira] [Commented] (HIVE-27598) Enhance alter table compact to work for partitioned tables without specifying any partition

2024-02-21 Thread Stamatis Zampetakis (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819281#comment-17819281
 ] 

Stamatis Zampetakis commented on HIVE-27598:


I have been reviewing HIVE-27848 and noticed that Initiator was split into 
multiple classes as part of this ticket. I understand that the idea was to 
reuse code that was already there but the current design that relies on 
inheritance is not ideal. 

Due to inheritance the {{InitiatorBase}} class becomes a Thread something that 
doesn't really make sense and it comes with additional overhead every time we 
instantiate it. Moreover, the only class that currently extends 
{{InitiatorBase}} is the Initiator and I hardly see how we can make other 
extensions from {{InitiatorBase}}; the code becomes complex and any subtle 
change in {{InitiatorBase}} may have unpredictable effects on {{Initiator}}.  
Having a "Base" class that is not really meant to be extended and no 
instructions on how to do so is a weird design pattern.

If someone finds time it would be nice to revisit this decision and remove 
inheritance and {{InitiatorBase}} from the picture.

> Enhance alter table compact to work for partitioned tables without specifying 
> any partition
> ---
>
> Key: HIVE-27598
> URL: https://issues.apache.org/jira/browse/HIVE-27598
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Taraka Rama Rao Lethavadla
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>
> *current implementation:* the below command will throw error
>  
> {noformat}
> alter table PARTITIONED_T compact 'minor';{noformat}
> for a partitioned table
> {code:java}
> You must specify a partition to compact for partitioned tables{code}
> {*}Problem{*}: One of the problem could be ,let's say a table contains 1000s 
> of partitions and for some issue like files permission automatic compaction 
> will mark all or few partitions as failed and never pick for compaction. 
> After we see performance getting degraded, we find out that there is a 
> problem in compaction due to some issue.
> Now to solve it, we have two approaches, adjust failure thresholds and 
> restart HMS to pick the changes which is not going to work for someone who 
> cannot restart HMS
> And the second approach is to execute alter table mentioning each partition
> {*}Solution{*}:
> In this Jira we are planning to allow alter table command for a partitioned 
> table without specifying any partition name.
> Proposed solution is to list all the partitions and find partitions that are 
> eligible to get compacted and put an entry for it in compaction queue



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28081) Code refine on ClearDanglingScratchDir::removeLocalTmpFiles

2024-02-21 Thread Butao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang resolved HIVE-28081.

Fix Version/s: 4.1.0
   Resolution: Fixed

Merged into master branch.

Thanks [~okumin] & [~zabetak] for the review!!!

> Code refine on ClearDanglingScratchDir::removeLocalTmpFiles
> ---
>
> Key: HIVE-28081
> URL: https://issues.apache.org/jira/browse/HIVE-28081
> Project: Hive
>  Issue Type: Improvement
>Reporter: Butao Zhang
>Assignee: Butao Zhang
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28015) Iceberg: Add identifier-field-ids support in Hive

2024-02-21 Thread Butao Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Butao Zhang resolved HIVE-28015.

Fix Version/s: 4.1.0
   Resolution: Fixed

Merged into master branch.

Thanks [~dkuzmenko] for the review!!!

> Iceberg: Add identifier-field-ids support in Hive
> -
>
> Key: HIVE-28015
> URL: https://issues.apache.org/jira/browse/HIVE-28015
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0
>Reporter: Denys Kuzmenko
>Assignee: Butao Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Some writer engines require primary keys on a table so that they can use them 
> for writing equality deletes (only the PK cols are written to the eq-delete 
> files).
> Hive currently doesn't reject setting PKs for Iceberg tables, however, it 
> just ignores them. This succeeds:
> {code:java}
> create table ice_pk (i int, j int, primary key(i)) stored by iceberg;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28082) HiveAggregateReduceFunctionsRule could generate an inconsistent result

2024-02-21 Thread okumin (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

okumin updated HIVE-28082:
--
Status: Patch Available  (was: Open)

I created a PR.

https://github.com/apache/hive/pull/5091

> HiveAggregateReduceFunctionsRule could generate an inconsistent result
> --
>
> Key: HIVE-28082
> URL: https://issues.apache.org/jira/browse/HIVE-28082
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 4.0.0-beta-1
>Reporter: okumin
>Assignee: okumin
>Priority: Major
>
> HiveAggregateReduceFunctionsRule translates AVG, STDDEV_POP, STDDEV_SAMP, 
> VAR_POP, and VAR_SAMP. Those UDFs accept string types and try to decode them 
> as floating point values. It is possible that undecodable values exist.
> We found that it could cause inconsistent behaviors with or without CBO.
> {code:java}
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text');
> ...
> +--+
> | _c0  |
> +--+
> | 0.0  |
> +--+
> 1 row selected (18.229 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> set hive.cbo.enable=false;
> No rows affected (0.013 seconds)
> 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text');
> ...
> +---+
> |  _c0  |
> +---+
> | NULL  |
> +---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819172#comment-17819172
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:27 AM:
---

That's strange. Our execution plans are different?

you: one map task

me: one map -> one reduce
{code:java}
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 26.27 s    
--
INFO  : Status: DAG finished successfully in 26.06 seconds
INFO  : DAG ID: dag_1706163520799_38211_2
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           0.00s
INFO  : Prepare Plan                            0.00s
INFO  : Get Query Coordinator (AM)              0.00s
INFO  : Submit Plan                         1708509885.24s
INFO  : Start DAG                               0.04s
INFO  : Run DAG                                26.05s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1           3949.00          8,360            118              
 2                2
INFO  :  Reducer 2              0.00            820              0              
 2                0
INFO  : 
--
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 2
INFO  :    TOTAL_LAUNCHED_TASKS: 2
INFO  :    RACK_LOCAL_TASKS: 1
INFO  :    AM_CPU_MILLISECONDS: 2650
INFO  :    WALL_CLOCK_MILLIS: 3912
INFO  :    AM_GC_TIME_MILLIS: 0
INFO  :    INITIAL_HELD_CONTAINERS: 0
INFO  :    TOTAL_CONTAINERS_USED: 1
INFO  :    TOTAL_CONTAINER_ALLOCATION_COUNT: 1
INFO  :    TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO  :    TOTAL_CONTAINER_REUSE_COUNT: 1
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 0
INFO  :    FILE_BYTES_WRITTEN: 0
INFO  :    FILE_READ_OPS: 0
INFO  :    FILE_LARGE_READ_OPS: 0
INFO  :    FILE_WRITE_OPS: 0
INFO  :    HDFS_BYTES_READ: 1210
INFO  :    HDFS_BYTES_WRITTEN: 121
INFO  :    HDFS_READ_OPS: 8
INFO  :    HDFS_LARGE_READ_OPS: 0
INFO  :    HDFS_WRITE_OPS: 2
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    SPILLED_RECORDS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    GC_TIME_MILLIS: 118
INFO  :    CPU_MILLISECONDS: 9180
INFO  :    WALL_CLOCK_MILLISECONDS: 3726
INFO  :    PHYSICAL_MEMORY_BYTES: 3435134976
INFO  :    VIRTUAL_MEMORY_BYTES: 7564333056
INFO  :    COMMITTED_HEAP_BYTES: 3435134976
INFO  :    INPUT_RECORDS_PROCESSED: 4
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 880
INFO  :    OUTPUT_RECORDS: 2
INFO  :    APPROXIMATE_INPUT_RECORDS: 2
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 8
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 18
INFO  :    OUTPUT_BYTES_PHYSICAL: 46
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_BYTES: 0
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 0
INFO  :    SHUFFLE_PHASE_TIME: 58
INFO  :    FIRST_EVENT_RECEIVED: 58
INFO  :    LAST_EVENT_RECEIVED: 58
INFO  :    DATA_BYTES_VIA_EVENT: 22
INFO  : HIVE:
INFO  :    CREATED_FILES: 1
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 2
INFO  :    RECORDS_OUT_0: 2
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 2
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_13: 2
INFO  :    RECORDS_OUT_OPERATOR_LIM_11: 2
INFO  :    RECORDS_OUT_OPERATOR_LIM_8: 2
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819172#comment-17819172
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:26 AM:
---

That's strange. Our execution plans are different?
{code:java}
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED  
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0  
Reducer 2 .. container     SUCCEEDED      1          1        0        0    
   0       0  
--
VERTICES: 02/02  [==>>] 100%  ELAPSED TIME: 26.27 s    
--
INFO  : Status: DAG finished successfully in 26.06 seconds
INFO  : DAG ID: dag_1706163520799_38211_2
INFO  : 
INFO  : Query Execution Summary
INFO  : 
--
INFO  : OPERATION                            DURATION
INFO  : 
--
INFO  : Compile Query                           0.00s
INFO  : Prepare Plan                            0.00s
INFO  : Get Query Coordinator (AM)              0.00s
INFO  : Submit Plan                         1708509885.24s
INFO  : Start DAG                               0.04s
INFO  : Run DAG                                26.05s
INFO  : 
--
INFO  : 
INFO  : Task Execution Summary
INFO  : 
--
INFO  :   VERTICES      DURATION(ms)   CPU_TIME(ms)    GC_TIME(ms)   
INPUT_RECORDS   OUTPUT_RECORDS
INFO  : 
--
INFO  :      Map 1           3949.00          8,360            118              
 2                2
INFO  :  Reducer 2              0.00            820              0              
 2                0
INFO  : 
--
INFO  : 
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    NUM_SUCCEEDED_TASKS: 2
INFO  :    TOTAL_LAUNCHED_TASKS: 2
INFO  :    RACK_LOCAL_TASKS: 1
INFO  :    AM_CPU_MILLISECONDS: 2650
INFO  :    WALL_CLOCK_MILLIS: 3912
INFO  :    AM_GC_TIME_MILLIS: 0
INFO  :    INITIAL_HELD_CONTAINERS: 0
INFO  :    TOTAL_CONTAINERS_USED: 1
INFO  :    TOTAL_CONTAINER_ALLOCATION_COUNT: 1
INFO  :    TOTAL_CONTAINER_LAUNCH_COUNT: 1
INFO  :    TOTAL_CONTAINER_REUSE_COUNT: 1
INFO  : File System Counters:
INFO  :    FILE_BYTES_READ: 0
INFO  :    FILE_BYTES_WRITTEN: 0
INFO  :    FILE_READ_OPS: 0
INFO  :    FILE_LARGE_READ_OPS: 0
INFO  :    FILE_WRITE_OPS: 0
INFO  :    HDFS_BYTES_READ: 1210
INFO  :    HDFS_BYTES_WRITTEN: 121
INFO  :    HDFS_READ_OPS: 8
INFO  :    HDFS_LARGE_READ_OPS: 0
INFO  :    HDFS_WRITE_OPS: 2
INFO  : org.apache.tez.common.counters.TaskCounter:
INFO  :    SPILLED_RECORDS: 0
INFO  :    NUM_SHUFFLED_INPUTS: 0
INFO  :    NUM_FAILED_SHUFFLE_INPUTS: 0
INFO  :    GC_TIME_MILLIS: 118
INFO  :    CPU_MILLISECONDS: 9180
INFO  :    WALL_CLOCK_MILLISECONDS: 3726
INFO  :    PHYSICAL_MEMORY_BYTES: 3435134976
INFO  :    VIRTUAL_MEMORY_BYTES: 7564333056
INFO  :    COMMITTED_HEAP_BYTES: 3435134976
INFO  :    INPUT_RECORDS_PROCESSED: 4
INFO  :    INPUT_SPLIT_LENGTH_BYTES: 880
INFO  :    OUTPUT_RECORDS: 2
INFO  :    APPROXIMATE_INPUT_RECORDS: 2
INFO  :    OUTPUT_LARGE_RECORDS: 0
INFO  :    OUTPUT_BYTES: 8
INFO  :    OUTPUT_BYTES_WITH_OVERHEAD: 18
INFO  :    OUTPUT_BYTES_PHYSICAL: 46
INFO  :    ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
INFO  :    ADDITIONAL_SPILLS_BYTES_READ: 0
INFO  :    ADDITIONAL_SPILL_COUNT: 0
INFO  :    SHUFFLE_BYTES: 0
INFO  :    SHUFFLE_BYTES_DECOMPRESSED: 0
INFO  :    SHUFFLE_BYTES_TO_MEM: 0
INFO  :    SHUFFLE_BYTES_TO_DISK: 0
INFO  :    SHUFFLE_BYTES_DISK_DIRECT: 0
INFO  :    SHUFFLE_PHASE_TIME: 58
INFO  :    FIRST_EVENT_RECEIVED: 58
INFO  :    LAST_EVENT_RECEIVED: 58
INFO  :    DATA_BYTES_VIA_EVENT: 22
INFO  : HIVE:
INFO  :    CREATED_FILES: 1
INFO  :    DESERIALIZE_ERRORS: 0
INFO  :    RECORDS_IN_Map_1: 2
INFO  :    RECORDS_OUT_0: 2
INFO  :    RECORDS_OUT_INTERMEDIATE_Map_1: 2
INFO  :    RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
INFO  :    RECORDS_OUT_OPERATOR_FS_13: 2
INFO  :    RECORDS_OUT_OPERATOR_LIM_11: 2
INFO  :    RECORDS_OUT_OPERATOR_LIM_8: 2
INFO  :    RECORDS_OUT_OPERATOR_MAP_0: 0
INFO  :    RECORDS_OUT_OPERATOR_RS_10: 2
INFO  :

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819172#comment-17819172
 ] 

yongzhi.shao commented on HIVE-27898:
-

That's strange. Our execution plans are different?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread Butao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819169#comment-17819169
 ] 

Butao Zhang commented on HIVE-27898:


*I have tested the two case: task and no task, both are fine.*

 

// set conf in beeline:

set hive.fetch.task.conversion=none;  //enforce launching tez task
set hive.explain.user=false;
{code:java}
0: jdbc:hive2://127.0.0.1:10004/default> select * from (select *  from 
testdb.test_data_02 limit 10) s1
--
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
--
Map 1 .. container     SUCCEEDED      1          1        0        0    
   0       0
--
VERTICES: 01/01  [==>>] 100%  ELAPSED TIME: 21.55 s
--
INFO  : Completed executing 
command(queryId=hive_20240221181020_b4a5968f-3594-4b1a-bc24-1b968cf6a993); Time 
taken: 6.12 seconds
++--+
| s1.id  | s1.name  |
++--+
| 1      | a        |
| 2      | b        |
++--+
2 rows selected (6.442 seconds)


0: jdbc:hive2://127.0.0.1:1/default> explain select * from (select *  from 
testdb.test_data_02 limit 10) s1;
++
|                      Explain                       |
++
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Tez                                            |
|       DagId: hive_20240221181118_d4e74ca2-cc3b-426c-830a-ee32fd3801b4:4 |
|       DagName: hive_20240221181118_d4e74ca2-cc3b-426c-830a-ee32fd3801b4:4 |
|       Vertices:                                    |
|         Map 1                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: test_data_02              |
|                   Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE |
|                   Limit                            |
|                     Number of rows: 10             |
|                     Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE |
|                     Select Operator                |
|                       expressions: id (type: string), name (type: string) |
|                       outputColumnNames: _col0, _col1 |
|                       Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE |
|                       File Output Operator         |
|                         compressed: false          |
|                         Statistics: Num rows: 2 Data size: 736 Basic stats: 
COMPLETE Column stats: NONE |
|                         table:                     |
|                             input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat |
|                             output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
|                             serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|             Execution mode: vectorized             |
|                                                    |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: 10                                    |
|       Processor Tree:                              |
|         ListSink                                   |
|                                                    |
++
37 rows selected (0.294 seconds) {code}

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
>

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:08 AM:
---

Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the SQL-plan?

here is mine:

 

Explain  Vertex dependency in root stage

Explain  Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)

Explain  

Explain  Stage-0

Explain    Fetch Operator

Explain      limit:-1

Explain      Stage-1

Explain        Reducer 2 vectorized

Explain        File Output Operator [FS_13]

Explain          Select Operator [SEL_12] (rows=2 width=4430)

Explain            Output:["_col0","_col1"]

Explain            Limit [LIM_11] (rows=2 width=4430)

Explain              Number of rows:10

Explain            <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized

Explain              PARTITION_ONLY_SHUFFLE [RS_10]

Explain                Select Operator [SEL_9] (rows=2 width=4430)

Explain                  Output:["_col0","_col1"]

Explain                  Limit [LIM_8] (rows=2 width=4430)

Explain                    Number of rows:10

Explain                    TableScan [TS_0] (rows=2 width=4430)

Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]

Explain  

22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1;


was (Author: lisoda):
Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the implementation plan?

here is mine:

 

Explain  Vertex dependency in root stage

Explain  Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)

Explain  

Explain  Stage-0

Explain    Fetch Operator

Explain      limit:-1

Explain      Stage-1

Explain        Reducer 2 vectorized

Explain        File Output Operator [FS_13]

Explain          Select Operator [SEL_12] (rows=2 width=4430)

Explain            Output:["_col0","_col1"]

Explain            Limit [LIM_11] (rows=2 width=4430)

Explain              Number of rows:10

Explain            <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized

Explain              PARTITION_ONLY_SHUFFLE [RS_10]

Explain                Select Operator [SEL_9] (rows=2 width=4430)

Explain                  Output:["_col0","_col1"]

Explain                  Limit [LIM_8] (rows=2 width=4430)

Explain                    Number of rows:10

Explain                    TableScan [TS_0] (rows=2 width=4430)

Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]

Explain  

22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1;

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:07 AM:
---

Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the implementation plan?

here is mine:

 

Explain  Vertex dependency in root stage

Explain  Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)

Explain  

Explain  Stage-0

Explain    Fetch Operator

Explain      limit:-1

Explain      Stage-1

Explain        Reducer 2 vectorized

Explain        File Output Operator [FS_13]

Explain          Select Operator [SEL_12] (rows=2 width=4430)

Explain            Output:["_col0","_col1"]

Explain            Limit [LIM_11] (rows=2 width=4430)

Explain              Number of rows:10

Explain            <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized

Explain              PARTITION_ONLY_SHUFFLE [RS_10]

Explain                Select Operator [SEL_9] (rows=2 width=4430)

Explain                  Output:["_col0","_col1"]

Explain                  Limit [LIM_8] (rows=2 width=4430)

Explain                    Number of rows:10

Explain                    TableScan [TS_0] (rows=2 width=4430)

Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]

Explain  

22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1;


was (Author: lisoda):
Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the implementation plan?

here is mine:

 

Explain  Vertex dependency in root stage

Explain  Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)

Explain  

Explain  Stage-0

Explain    Fetch Operator

Explain      limit:-1

Explain      Stage-1

Explain        Reducer 2 vectorized

Explain        File Output Operator [FS_13]

Explain          Select Operator [SEL_12] (rows=2 width=4430)

Explain            Output:["_col0","_col1"]

Explain            Limit [LIM_11] (rows=2 width=4430)

Explain              Number of rows:10

Explain            <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized

Explain              PARTITION_ONLY_SHUFFLE [RS_10]

Explain                Select Operator [SEL_9] (rows=2 width=4430)

Explain                  Output:["_col0","_col1"]

Explain                  Limit [LIM_8] (rows=2 width=4430)

Explain                    Number of rows:10

Explain                    TableScan [TS_0] (rows=2 width=4430)

Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]

Explain  

22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1;
{code:java}
 {code}

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:07 AM:
---

Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the implementation plan?

here is mine:

 

Explain  Vertex dependency in root stage

Explain  Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)

Explain  

Explain  Stage-0

Explain    Fetch Operator

Explain      limit:-1

Explain      Stage-1

Explain        Reducer 2 vectorized

Explain        File Output Operator [FS_13]

Explain          Select Operator [SEL_12] (rows=2 width=4430)

Explain            Output:["_col0","_col1"]

Explain            Limit [LIM_11] (rows=2 width=4430)

Explain              Number of rows:10

Explain            <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized

Explain              PARTITION_ONLY_SHUFFLE [RS_10]

Explain                Select Operator [SEL_9] (rows=2 width=4430)

Explain                  Output:["_col0","_col1"]

Explain                  Limit [LIM_8] (rows=2 width=4430)

Explain                    Number of rows:10

Explain                    TableScan [TS_0] (rows=2 width=4430)

Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]

Explain  

22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1;
{code:java}
 {code}


was (Author: lisoda):
Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the implementation plan?

here is mine:
{code:java}
Explain  Vertex dependency in root stageExplain  Reducer 2 <- Map 1 
(CUSTOM_SIMPLE_EDGE)Explain  Explain  Stage-0Explain    Fetch OperatorExplain   
   limit:-1Explain      Stage-1Explain        Reducer 2 vectorizedExplain       
 File Output Operator [FS_13]Explain          Select Operator [SEL_12] (rows=2 
width=4430)Explain            Output:["_col0","_col1"]Explain            Limit 
[LIM_11] (rows=2 width=4430)Explain              Number of rows:10Explain       
     <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorizedExplain              
PARTITION_ONLY_SHUFFLE [RS_10]Explain                Select Operator [SEL_9] 
(rows=2 width=4430)Explain                  Output:["_col0","_col1"]Explain     
             Limit [LIM_8] (rows=2 width=4430)Explain                    Number 
of rows:10Explain                    TableScan [TS_0] (rows=2 
width=4430)Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]Explain
  22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1; {code}

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:06 AM:
---

Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

 

Also, can you share the implementation plan?

here is mine:
{code:java}
Explain  Vertex dependency in root stageExplain  Reducer 2 <- Map 1 
(CUSTOM_SIMPLE_EDGE)Explain  Explain  Stage-0Explain    Fetch OperatorExplain   
   limit:-1Explain      Stage-1Explain        Reducer 2 vectorizedExplain       
 File Output Operator [FS_13]Explain          Select Operator [SEL_12] (rows=2 
width=4430)Explain            Output:["_col0","_col1"]Explain            Limit 
[LIM_11] (rows=2 width=4430)Explain              Number of rows:10Explain       
     <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorizedExplain              
PARTITION_ONLY_SHUFFLE [RS_10]Explain                Select Operator [SEL_9] 
(rows=2 width=4430)Explain                  Output:["_col0","_col1"]Explain     
             Limit [LIM_8] (rows=2 width=4430)Explain                    Number 
of rows:10Explain                    TableScan [TS_0] (rows=2 
width=4430)Explain                      
iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]Explain
  22 rows selected (0.979 seconds)
0: jdbc:hive2://smaster01:1> explain select * from (select * from 
iceberg_dwd.test_data_02 limit 10) s1; {code}


was (Author: lisoda):
Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167
 ] 

yongzhi.shao commented on HIVE-27898:
-

Did HIVE start a TEZ task when you executed this SQL?
select * from (select *  from testdb.test_data_02 limit 10) s1;

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread Butao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819165#comment-17819165
 ] 

Butao Zhang commented on HIVE-27898:


I just did the test again base on master branch, and everything is ok, like 
what i did the test 
https://issues.apache.org/jira/browse/HIVE-27898?focusedCommentId=17790407=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17790407
 . I think branch-4.0 is also fine. I am not sure why you still have this 
issue. 

Maybe you can try master branch again? 

 
{code:java}
0: jdbc:hive2://127.0.0.1:1/default> select name from (select * from 
testdb.test_data_02 limit 10) s1;
+---+
| name  |
+---+
| a     |
| b     |
+---+
2 rows selected (0.382 seconds)

0: jdbc:hive2://127.0.0.1:1/default> select id from (select * from 
testdb.test_data_02 limit 10) s1;
+-+
| id  |
+-+
| 1   |
| 2   |
+-+
2 rows selected (0.372 seconds)


0: jdbc:hive2://127.0.0.1:1/default> select *  from testdb.test_data_02 
limit 10;
+--++
| test_data_02.id  | test_data_02.name  |
+--++
| 1                | a        |
| 2                | b        |
+--++
2 rows selected (0.355 seconds)

0: jdbc:hive2://127.0.0.1:1/default> select * from (select *  from 
testdb.test_data_02 limit 10) s1;
++--+
| s1.id  | s1.name  |
++--+
| 1      | a        |
| 2      | b        |
++--+
2 rows selected (0.346 seconds){code}
 

 

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from

[jira] [Created] (HIVE-28084) Iceberg: COW fix for Merge operation

2024-02-21 Thread Denys Kuzmenko (Jira)

Denys Kuzmenko created HIVE-28084:
-

 Summary: Iceberg: COW fix for Merge operation
 Key: HIVE-28084
 URL: https://issues.apache.org/jira/browse/HIVE-28084
 Project: Hive
  Issue Type: Bug
Reporter: Denys Kuzmenko






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:49 AM:
--

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945  if we can't backport the 
patch associated with 4.0.0 when it is released?

 
{code:java}
 spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');




--hive 4.0.0-beta-2 (4.0 branch)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1; - 2row

select name from (select * from iceberg_dwd.test_data_02 limit 10) s1;  - 2row

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}


was (Author: lisoda):
[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
{code:java}
 spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');




--hive 4.0.0-beta-2 (4.0 branch)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1; - 2row

select name from (select * from iceberg_dwd.test_data_02 limit 10) s1;  - 2row

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819157#comment-17819157
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:47 AM:
--

A more interesting phenomenon:
{code:java}
test_data_02.id    1
test_data_02.name  a
test_data_02.id    2
test_data_02.name  b
jdbc:hive2://xxx:1> select *  from iceberg_dwd.test_data_02 limit 10; 



s1.id    NULL
s1.name  a
s1.id    NULL
s1.name  b
jdbc:hive2://xx:1> select * from (select *  from 
iceberg_dwd.test_data_02 limit 10) s1;{code}
Where did the data in the ID column go .?


was (Author: lisoda):
A more interesting phenomenon:
{code:java}
test_data_02.id    1
test_data_02.name  a
test_data_02.id    2
test_data_02.name  b
jdbc:hive2://smaster01:1> select *  from iceberg_dwd.test_data_02 limit 10; 



s1.id    NULL
s1.name  a
s1.id    NULL
s1.name  b
jdbc:hive2://smaster01:1> select * from (select *  from 
iceberg_dwd.test_data_02 limit 10) s1;{code}
Where did the data in the ID column go .?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}

[jira] [Commented] (HIVE-28071) Sync jetty version across modules

2024-02-21 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-28071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819159#comment-17819159
 ] 

Ayush Saxena commented on HIVE-28071:
-

Committed to master.

Thanx [~Aggarwal_Raghav] for the contribution!!!

> Sync jetty version across modules
> -
>
> Key: HIVE-28071
> URL: https://issues.apache.org/jira/browse/HIVE-28071
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Minor
>  Labels: pull-request-available
>
> In pom.xml, we have jetty version 9.4.45.v20220203 but in 
> stanalone-metastore/pom.xml and itests/qtest-druid/pom.xml we have jetty 
> version 9.4.40.v20210413.
> Looks like it was skipped during HIVE-27757



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (HIVE-28071) Sync jetty version across modules

2024-02-21 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HIVE-28071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HIVE-28071.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

> Sync jetty version across modules
> -
>
> Key: HIVE-28071
> URL: https://issues.apache.org/jira/browse/HIVE-28071
> Project: Hive
>  Issue Type: Improvement
>Reporter: Raghav Aggarwal
>Assignee: Raghav Aggarwal
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> In pom.xml, we have jetty version 9.4.45.v20220203 but in 
> stanalone-metastore/pom.xml and itests/qtest-druid/pom.xml we have jetty 
> version 9.4.40.v20210413.
> Looks like it was skipped during HIVE-27757



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819157#comment-17819157
 ] 

yongzhi.shao commented on HIVE-27898:
-

A more interesting phenomenon:
{code:java}
test_data_02.id    1
test_data_02.name  a
test_data_02.id    2
test_data_02.name  b
jdbc:hive2://smaster01:1> select *  from iceberg_dwd.test_data_02 limit 10; 



s1.id    NULL
s1.name  a
s1.id    NULL
s1.name  b
jdbc:hive2://smaster01:1> select * from (select *  from 
iceberg_dwd.test_data_02 limit 10) s1;{code}
Where did the data in the ID column go .?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819152#comment-17819152
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:36 AM:
--

[~zhangbutao] 

Hello. However, I'm now testing with the 4.0 branch I just compiled, and most 
of the problems are indeed fixed, but now there is only this SQL problem.
{code:java}
0: jdbc:hive2://x:1> select version();
_c0  4.0.0-beta-2-SNAPSHOT rf355c82a5aa77ef1496b35c22b8ac9b84dfe1780  

select name from (select * from iceberg_dwd.test_data_02 limit 10) s1;   – 2 
row 

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   – all 
row is null{code}
Can't I filter a column named ID from the results of a subquery?


was (Author: lisoda):
[~zhangbutao] 

Hello. However, I'm now testing with the 4.0 branch I just compiled, and most 
of the problems are indeed fixed, but now there is only this SQL problem.
{code:java}
0: jdbc:hive2://x:1> select version();
_c0  4.0.0-beta-2-SNAPSHOT rf355c82a5aa77ef1496b35c22b8ac9b84dfe1780



select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   – all 
row is null{code}
Can't I filter a column named ID from the results of a subquery?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id =

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819152#comment-17819152
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:32 AM:
--

[~zhangbutao] 

Hello. However, I'm now testing with the 4.0 branch I just compiled, and most 
of the problems are indeed fixed, but now there is only this SQL problem.
{code:java}
0: jdbc:hive2://x:1> select version();
_c0  4.0.0-beta-2-SNAPSHOT rf355c82a5aa77ef1496b35c22b8ac9b84dfe1780



select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   – all 
row is null{code}
Can't I filter a column named ID from the results of a subquery?


was (Author: lisoda):
[~zhangbutao] 

Hello. However, I'm now testing with the 4.0 branch I just compiled, and most 
of the problems are indeed fixed, but now there is only this SQL problem.
{code:java}
select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   – all 
row is null{code}
Can't I filter a column named ID from the results of a subquery?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819152#comment-17819152
 ] 

yongzhi.shao commented on HIVE-27898:
-

[~zhangbutao] 

Hello. However, I'm now testing with the 4.0 branch I just compiled, and most 
of the problems are indeed fixed, but now there is only this SQL problem.
{code:java}
select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   – all 
row is null{code}
Can't I filter a column named ID from the results of a subquery?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:27 AM:
--

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
{code:java}
 spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');




--hive 4.0.0-beta-2 (4.0 branch)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1; - 2row

select name from (select * from iceberg_dwd.test_data_02 limit 10) s1;  - 2row

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}


was (Author: lisoda):
[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
{code:java}
 spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');




--hive 4.0.0-beta-2 (4.0 branch)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
select id from (select * from iceberg_dwd.test_data_02) s1; - 2row
select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:25 AM:
--

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
{code:java}
 spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');




--hive 4.0.0-beta-2 (4.0 branch)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
select id from (select * from iceberg_dwd.test_data_02) s1; - 2row
select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}


was (Author: lisoda):
[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
{code:java}
 
spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;
insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');
--hive 4.0.0-beta-2 (4.0 branch,)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
select id from (select * from iceberg_dwd.test_data_02) s1; - 2row
select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread Butao Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819150#comment-17819150
 ] 

Butao Zhang commented on HIVE-27898:


https://issues.apache.org/jira/browse/HIVE-27898?focusedCommentId=17790407=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17790407

branch-4.0 has all important fixes. Based the test i did before, i don't think 
branch-4.0 has this issue.

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:25 AM:
--

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
{code:java}
 
spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;
insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b');
--hive 4.0.0-beta-2 (4.0 branch,)
CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
select id from (select * from iceberg_dwd.test_data_02) s1; - 2row
select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row 
is null {code}


was (Author: lisoda):
[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive  4.0.0-beta-2 (4.0 branch,)
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:23 AM:
--

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB 
- ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 
when it is released?

 
spark3.4.1+iceberg 1.4.3
CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 (
id string,name string
)
PARTITIONED BY (name)
TBLPROPERTIES 
('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;

insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); 


--hive  4.0.0-beta-2 (4.0 branch,)
 CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 
 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 
'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');

select id from (select * from iceberg_dwd.test_data_02) s1;   - 2row   

select id from (select * from iceberg_dwd.test_data_02 limit 10) s1;   -- all 
row is null


was (Author: lisoda):
[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to [HIVE-27945] Add 4.0 known issues into 
TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 
4.0.0 when it is released?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION

[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:20 AM:
--

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

Also, do we need to add this issue to [HIVE-27945] Add 4.0 known issues into 
TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 
4.0.0 when it is released?


was (Author: lisoda):
[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries

2024-02-21 Thread yongzhi.shao (Jira)



[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145
 ] 

yongzhi.shao commented on HIVE-27898:
-

[~zhangbutao] :

hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I 
need to test with the master branch?

> HIVE4 can't use ICEBERG table in subqueries
> ---
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Improvement
>  Components: Iceberg integration
>Affects Versions: 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

58 matches

Mail list logo