[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-22 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819652#comment-17819652
 ] 

Butao Zhang commented on HIVE-27898:


[~dkuzmenko] sure, i removed the 4.0 label. 

Based on the test i did, even though this is a bug, we also have a workaroud. 
This ticket should't block 4.0 release.

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-22 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819606#comment-17819606
 ] 

Denys Kuzmenko commented on HIVE-27898:
---

[~zhangbutao], could we remove this from 4.0 blockers?

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread yongzhi.shao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819493#comment-17819493
 ] 

yongzhi.shao commented on HIVE-27898:
-

testdb.test_data_02 is iceberg table?

> When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries 
> properly
> 
>
> Key: HIVE-27898
> URL: https://issues.apache.org/jira/browse/HIVE-27898
> Project: Hive
>  Issue Type: Bug
>  Components: Iceberg integration
>Affects Versions: 4.0.0, 4.0.0-beta-1
>Reporter: yongzhi.shao
>Priority: Critical
> Attachments: hive-site.xml
>
>
> Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG 
> table in the subquery, we can't get any data in the end.
> I have used HIVE3-TEZ for cross validation and HIVE3 does not have this 
> problem when querying ICEBERG.
> {code:java}
> --spark3.4.1+iceberg 1.4.2
> CREATE TABLE datacenter.dwd.b_std_trade (
>   uni_order_id STRING,
>   data_from BIGINT,
>   partner STRING,
>   plat_code STRING,
>   order_id STRING,
>   uni_shop_id STRING,
>   uni_id STRING,
>   guide_id STRING,
>   shop_id STRING,
>   plat_account STRING,
>   total_fee DOUBLE,
>   item_discount_fee DOUBLE,
>   trade_discount_fee DOUBLE,
>   adjust_fee DOUBLE,
>   post_fee DOUBLE,
>   discount_rate DOUBLE,
>   payment_no_postfee DOUBLE,
>   payment DOUBLE,
>   pay_time STRING,
>   product_num BIGINT,
>   order_status STRING,
>   is_refund STRING,
>   refund_fee DOUBLE,
>   insert_time STRING,
>   created STRING,
>   endtime STRING,
>   modified STRING,
>   trade_type STRING,
>   receiver_name STRING,
>   receiver_country STRING,
>   receiver_state STRING,
>   receiver_city STRING,
>   receiver_district STRING,
>   receiver_town STRING,
>   receiver_address STRING,
>   receiver_mobile STRING,
>   trade_source STRING,
>   delivery_type STRING,
>   consign_time STRING,
>   orders_num BIGINT,
>   is_presale BIGINT,
>   presale_status STRING,
>   first_fee_paytime STRING,
>   last_fee_paytime STRING,
>   first_paid_fee DOUBLE,
>   tenant STRING,
>   tidb_modified STRING,
>   step_paid_fee DOUBLE,
>   seller_flag STRING,
>   is_used_store_card BIGINT,
>   store_card_used DOUBLE,
>   store_card_basic_used DOUBLE,
>   store_card_expand_used DOUBLE,
>   order_promotion_num BIGINT,
>   item_promotion_num BIGINT,
>   buyer_remark STRING,
>   seller_remark STRING,
>   trade_business_type STRING)
> USING iceberg
> PARTITIONED BY (uni_shop_id, truncate(4, created))
> LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES (
>   'current-snapshot-id' = '7217819472703702905',
>   'format' = 'iceberg/orc',
>   'format-version' = '1',
>   'hive.stored-as' = 'iceberg',
>   'read.orc.vectorization.enabled' = 'true',
>   'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST',
>   'write.distribution-mode' = 'hash',
>   'write.format.default' = 'orc',
>   'write.metadata.delete-after-commit.enabled' = 'true',
>   'write.metadata.previous-versions-max' = '3',
>   'write.orc.bloom.filter.columns' = 'order_id',
>   'write.orc.compression-codec' = 'zstd')
> --hive-iceberg
>  CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade 
>  STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
> LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade'
> TBLPROPERTIES 
> ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10  --10 rows
> select *
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;   --10 rows
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> select uni_shop_id
> from ( 
> select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;  --0 rows
> --hive-orc
> select uni_shop_id
> from ( 
> select * from iceberg_dwd.trade_test 
> where uni_shop_id = 'TEST|1' limit 10
> ) t1;--10 ROWS{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly

2024-02-21 Thread Butao Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819491#comment-17819491
 ] 

Butao Zhang commented on HIVE-27898:


*After disable cbo, the query is also fine.*

 

// set conf in beeline:

set hive.fetch.task.conversion=none;  //enforce launching tez task
set hive.explain.user=false;

set hive.cbo.enable=false;

 
{code:java}
0: jdbc:hive2://127.0.0.1:1/default> show create table testdb.test_data_02;
INFO  : Compiling 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e): show 
create table  testdb.test_data_02
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from 
deserializer)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e); Time 
taken: 0.045 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e): show 
create table  testdb.test_data_02
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing 
command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e); Time 
taken: 0.057 seconds
++
|                   createtab_stmt                   |
++
| CREATE EXTERNAL TABLE `testdb`.`test_data_02`(     |
|   `id` string,                                     |
|   `name` string)                                   |
| PARTITIONED BY SPEC (                              |
| `name`)                                            |
| ROW FORMAT SERDE                                   |
|   'org.apache.iceberg.mr.hive.HiveIcebergSerDe'    |
| STORED BY                                          |
|   'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'  |
| WITH SERDEPROPERTIES (                             |
|   'serialization.format'='1')                      |
| LOCATION                                           |
|   'hdfs://127.0.0.1:8028/tmp/testiceberg/default/test_data_02' |
| TBLPROPERTIES (                                    |
|   'bucketing_version'='2',                         |
|   'current-snapshot-id'='6130266132019241842',     |
|   'engine.hive.enabled'='true',                    |
|   'format-version'='2',                            |
|   'iceberg.catalog'='location_based_table',        |
|   'table_type'='ICEBERG',                          |
|   'transient_lastDdlTime'='1708508832')            |
++
21 rows selected (0.482 seconds)
0: jdbc:hive2://127.0.0.1:1/default>
0: jdbc:hive2://127.0.0.1:1/default>
0: jdbc:hive2://127.0.0.1:1/default> explain select * from (select *  from 
testdb.test_data_02 limit 10) s1;
INFO  : Compiling 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97): 
explain select * from (select *  from testdb.test_data_02 limit 10) s1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, 
type:string, comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97); Time 
taken: 0.09 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97): 
explain select * from (select *  from testdb.test_data_02 limit 10) s1
INFO  : Starting task [Stage-3:EXPLAIN] in serial mode
INFO  : Completed executing 
command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97); Time 
taken: 0.008 seconds
++
|                      Explain                       |
++
| STAGE DEPENDENCIES:                                |
|   Stage-1 is a root stage                          |
|   Stage-0 depends on stages: Stage-1               |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-1                                   |
|     Tez                                            |
|       DagId: hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97:18 |
|       Edges:                                       |
|         Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)    |
|       DagName: hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97:18 |
|       Vertices:                                    |
|         Map 1                                      |
|             Map Operator Tree:                     |
|                 TableScan                          |
|                   alias: test_data_02              |
|