[jira] [Updated] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-27924: -- Fix Version/s: 4.1.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Assignee: Krisztian Kasa >Priority: Critical > Labels: bug, hive, hive-4.1.0-must, known_issue, > materializedviews, pull-request-available > Fix For: 4.1.0 > > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:sql} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:sql} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:sql} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:sql} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:sql} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:sql} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:sql} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:sql} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:sql} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0; {code} > !截图8.PNG! > # Disable MV-based query rewrite
[jira] [Commented] (HIVE-27924) Incremental rebuild goes wrong when inserts and deletes overlap between the source tables
[ https://issues.apache.org/jira/browse/HIVE-27924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819506#comment-17819506 ] Krisztian Kasa commented on HIVE-27924: --- Merged to master. Thanks [~dkuzmenko] for review the patch and [~wenhaoli] for the detailed repro steps. > Incremental rebuild goes wrong when inserts and deletes overlap between the > source tables > - > > Key: HIVE-27924 > URL: https://issues.apache.org/jira/browse/HIVE-27924 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 4.0.0-beta-1 > Environment: * Docker version : 19.03.6 > * Hive version : 4.0.0-beta-1 > * Driver version : Hive JDBC (4.0.0-beta-1) > * Beeline version : 4.0.0-beta-1 >Reporter: Wenhao Li >Assignee: Krisztian Kasa >Priority: Critical > Labels: bug, hive, hive-4.1.0-must, known_issue, > materializedviews, pull-request-available > Attachments: 截图.PNG, 截图1.PNG, 截图2.PNG, 截图3.PNG, 截图4.PNG, 截图5.PNG, > 截图6.PNG, 截图7.PNG, 截图8.PNG, 截图9.PNG > > > h1. Summary > The incremental rebuild plan and execution output are incorrect when one side > of the table join has inserted/deleted join keys that the other side has > deleted/inserted (note the order). > The argument is that tuples that have never been present simultaneously > should not interact with one another, i.e., one's inserts should not join the > other's deletes. > h1. Related Test Case > The bug was discovered during replication of the test case: > ??hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q?? > h1. Steps to Reproduce the Issue > # Configurations: > {code:sql} > SET hive.vectorized.execution.enabled=false; > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > set hive.strict.checks.cartesian.product=false; > set hive.materializedview.rewriting=true;{code} > # > {code:sql} > create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) > stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_n6 values > (1, 'alfred', 10.30, 2), > (1, 'charlie', 20.30, 2); {code} > # > {code:sql} > create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d > int) stored as orc TBLPROPERTIES ('transactional'='true'); {code} > # > {code:sql} > insert into cmv_basetable_2_n3 values > (1, 'bob', 30.30, 2), > (1, 'bonnie', 40.30, 2);{code} > # > {code:sql} > CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES > ('transactional'='true') AS > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0;{code} > # > {code:sql} > show tables; {code} > !截图.PNG! > # Select tuples, including deletion and with VirtualColumn's, from the MV > and source tables. We see that the MV is correctly built upon creation: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图1.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图2.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图3.PNG! > # Now make an insert to the LHS and a delete to the RHS source table: > {code:sql} > insert into cmv_basetable_n6 values > (1, 'kevin', 50.30, 2); > DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';{code} > # Select again to see what happened: > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_n6('acid.fetch.deleted.rows'='true'); {code} > !截图4.PNG! > # > {code:sql} > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_basetable_2_n3('acid.fetch.deleted.rows'='true'); {code} > !截图5.PNG! > # Use {{EXPLAIN CBO}} to produce the incremental rebuild plan for the MV, > which is incorrect already: > {code:sql} > EXPLAIN CBO > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; {code} > !截图6.PNG! > # Rebuild MV and see (incorrect) results: > {code:sql} > ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; > SELECT ROW__IS__DELETED, ROW__ID, * FROM > cmv_mat_view_n6('acid.fetch.deleted.rows'='true');{code} > !截图7.PNG! > # Run MV definition directly, which outputs incorrect results because the MV > is enabled for MV-based query rewrite, i.e., the following query will output > what's in the MV for the time being: > {code:sql} > SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c > FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = > cmv_basetable_2_n3.a) > WHERE cmv_basetable_2_n3.c > 10.0; {code} > !截图8.PNG! > # Disable MV-based
[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819493#comment-17819493 ] yongzhi.shao commented on HIVE-27898: - testdb.test_data_02 is iceberg table? > When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries > properly > > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0, 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly
[ https://issues.apache.org/jira/browse/HIVE-27898 ] yongzhi.shao deleted comment on HIVE-27898: - was (Author: lisoda): testdb.test_data_02 is iceberg table? > When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries > properly > > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0, 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819491#comment-17819491 ] Butao Zhang commented on HIVE-27898: *After disable cbo, the query is also fine.* // set conf in beeline: set hive.fetch.task.conversion=none; //enforce launching tez task set hive.explain.user=false; set hive.cbo.enable=false; {code:java} 0: jdbc:hive2://127.0.0.1:1/default> show create table testdb.test_data_02; INFO : Compiling command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e): show create table testdb.test_data_02 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e); Time taken: 0.045 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e): show create table testdb.test_data_02 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20240222143047_1bb04d39-fc35-40e3-a897-d4211d3f5e9e); Time taken: 0.057 seconds ++ | createtab_stmt | ++ | CREATE EXTERNAL TABLE `testdb`.`test_data_02`( | | `id` string, | | `name` string) | | PARTITIONED BY SPEC ( | | `name`) | | ROW FORMAT SERDE | | 'org.apache.iceberg.mr.hive.HiveIcebergSerDe' | | STORED BY | | 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' | | WITH SERDEPROPERTIES ( | | 'serialization.format'='1') | | LOCATION | | 'hdfs://127.0.0.1:8028/tmp/testiceberg/default/test_data_02' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'current-snapshot-id'='6130266132019241842', | | 'engine.hive.enabled'='true', | | 'format-version'='2', | | 'iceberg.catalog'='location_based_table', | | 'table_type'='ICEBERG', | | 'transient_lastDdlTime'='1708508832') | ++ 21 rows selected (0.482 seconds) 0: jdbc:hive2://127.0.0.1:1/default> 0: jdbc:hive2://127.0.0.1:1/default> 0: jdbc:hive2://127.0.0.1:1/default> explain select * from (select * from testdb.test_data_02 limit 10) s1; INFO : Compiling command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97): explain select * from (select * from testdb.test_data_02 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:Explain, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97); Time taken: 0.09 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97): explain select * from (select * from testdb.test_data_02 limit 10) s1 INFO : Starting task [Stage-3:EXPLAIN] in serial mode INFO : Completed executing command(queryId=hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97); Time taken: 0.008 seconds ++ | Explain | ++ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97:18 | | Edges: | | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) | | DagName: hive_20240222143051_e69515c5-6c8d-4bd7-8973-faa3f9ed6d97:18 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: test_data_02 | |
[jira] [Updated] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Affects Version/s: 4.0.0 > When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries > properly > > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0, 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Issue Type: Bug (was: Improvement) > When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries > properly > > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Bug > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Summary: When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries properly (was: 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表.) > When CBO is disabled, HIVE cannot query ICEBERG tables in nested subqueries > properly > > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819479#comment-17819479 ] yongzhi.shao commented on HIVE-27898: - [~zhangbutao] : Hi, I found the problem, can you close cbo and try again? set hive.cbo.enable=false; > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表.
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Summary: 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表. (was: HIVE4 can't use ICEBERG table in subqueries) > 当禁用CBO后,HIVE无法正常查询嵌套子查询中的ICEBERG表. > -- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Attachment: hive-site.xml > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yongzhi.shao updated HIVE-27898: Attachment: (was: hive-site.xml) > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > Attachments: hive-site.xml > > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:04 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. Also, I noticed in your previous test that one of the tables is not an ICEBERG table.(testdb.test_data_02) This SQL executes normal internal tables without problems. It is the ICEBERG table that has problems executing. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:03 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. Also, I noticed in your previous test that one of the tables is not an ICEBERG table. ( testdb.test_data_02 ) This SQL executes normal internal tables without problems. It is the ICEBERG table that has problems executing. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 6:02 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. Also, I noticed in your previous test that one of the tables is not an ICEBERG table. This SQL executes normal internal tables without problems. It is the ICEBERG table that has problems executing. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:55 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 0.00 10,290 290
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:51 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. Please note that in the statistics of Tez, the anomaly in the statistics of the Submit Plan metric has been fixed. This proves that I am using the latest version of the code. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 0.00 10,290 290
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:50 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 0.00 10,290 290 2 2 INFO : Reducer 2 0.00 1,020 0 2 0 INFO :
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:49 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 0.00 10,290 290 2 2 INFO : Reducer 2 0.00 1,020 0 2 0 INFO :
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao edited comment on HIVE-27898 at 2/22/24 5:49 AM: -- [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 0.00 10,290 290 2 2 INFO : Reducer 2 0.00 1,020 0 2 0 INFO :
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819475#comment-17819475 ] yongzhi.shao commented on HIVE-27898: - [~zhangbutao] I tried the master branch, using the latest updated code today, and the problem persists. {code:java} 0: jdbc:hive2://smaster01:1> select version(); INFO : Compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 1.974 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13): select version() INFO : Completed executing command(queryId=hive_20240222134713_2aaf182e-be55-4946-9a7c-c19d4a047e13); Time taken: 0.004 seconds _c0 4.0.0-beta-2-SNAPSHOT r6e061e6559522c8a060c1b55439ada0001bf5e5d 0: jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_04 limit 10) s1; INFO : Compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:s1.id, type:string, comment:null), FieldSchema(name:s1.name, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6); Time taken: 1.007 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6): select * from (select * from iceberg_dwd.test_data_04 limit 10) s1 INFO : Query ID = hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Total jobs = 1 INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-1:MAPRED] in serial mode INFO : Subscribed to counters: [] for queryId: hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6 INFO : Tez session hasn't been created yet. Opening session INFO : Dag name: select * from (select..a_04 limit 10) s1 (Stage-1) INFO : HS2 Host: [smaster01], Query ID: [hive_20240222134526_87055172-e808-4726-acdf-89f6b8b815e6], Dag ID: [dag_1706163520799_39421_1], DAG Session ID: [application_1706163520799_39421] INFO : Status: Running (Executing on YARN cluster with App id application_1706163520799_39421)-- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 20.10 s -- INFO : Status: DAG finished successfully in 19.90 seconds INFO : DAG ID: dag_1706163520799_39421_1 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 1.01s INFO : Prepare Plan 16.62s INFO : Get Query Coordinator (AM) 0.03s INFO : Submit Plan 0.54s INFO : Start DAG 0.07s INFO : Run DAG 19.90s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 0.00 10,290 290 2 2 INFO : Reducer 2 0.00 1,020 0 2 0 INFO :
[jira] [Comment Edited] (HIVE-26435) Add method for collecting HMS meta summary
[ https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819470#comment-17819470 ] Zhihua Deng edited comment on HIVE-26435 at 2/22/24 4:55 AM: - Fix has been merged into master, Thank you [~danielzhu] and [~ruyi.zheng] for the work! was (Author: dengzh): Fix has been merged into master, Thank you [~danielzhu] for the work! > Add method for collecting HMS meta summary > -- > > Key: HIVE-26435 > URL: https://issues.apache.org/jira/browse/HIVE-26435 > Project: Hive > Issue Type: New Feature >Reporter: Ruyi Zheng >Assignee: Hongdan Zhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Hive Metastore currently lacks visibility into its metadata. This work > includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to > print a summary (catalog name, database name, table name, partition column > count, number of rows. table type, file type, compression type, total data > size, etc). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26435) Add method for collecting HMS meta summary
[ https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-26435. Fix Version/s: 4.0.0 Assignee: Hongdan Zhu (was: Naveen Gangam) Resolution: Fixed Fix has been merged into master, Thank you [~danielzhu] for the work! > Add method for collecting HMS meta summary > -- > > Key: HIVE-26435 > URL: https://issues.apache.org/jira/browse/HIVE-26435 > Project: Hive > Issue Type: New Feature >Reporter: Ruyi Zheng >Assignee: Hongdan Zhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Hive Metastore currently lacks visibility into its metadata. This work > includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to > print a summary (catalog name, database name, table name, partition column > count, number of rows. table type, file type, compression type, total data > size, etc). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26435) Add method for collecting HMS meta summary
[ https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng updated HIVE-26435: --- Summary: Add method for collecting HMS meta summary (was: HMS Summary) > Add method for collecting HMS meta summary > -- > > Key: HIVE-26435 > URL: https://issues.apache.org/jira/browse/HIVE-26435 > Project: Hive > Issue Type: New Feature >Reporter: Ruyi Zheng >Assignee: Naveen Gangam >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Hive Metastore currently lacks visibility into its metadata. This work > includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to > print a summary (catalog name, database name, table name, partition column > count, number of rows. table type, file type, compression type, total data > size, etc). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-27778) Alter table command gives error after computer stats is run with Impala
[ https://issues.apache.org/jira/browse/HIVE-27778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihua Deng resolved HIVE-27778. Fix Version/s: 4.0.0 Resolution: Fixed Fix has been merged into master. Thank you [~dkuzmenko] and [~zhangbutao] for the review! > Alter table command gives error after computer stats is run with Impala > --- > > Key: HIVE-27778 > URL: https://issues.apache.org/jira/browse/HIVE-27778 > Project: Hive > Issue Type: Bug >Reporter: Kokila N >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Hive partitioned table's column stats are stored on partition level which is > in PART_COL_STATS in sys. > When "column stats " query is run in Impala on a Hive partitioned > table generates column stats on table level and is stored in TAB_COL_STATS. > So, Executing "Alter rename to " after impala compute > stats throws error. > {code:java} > ERROR : Failed > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. > Cannot change stats state for a transactional table default.parqtest without > providing the transactional write state for verification (new write ID 6, > valid write IDs null; current state null; new state {} {code} > The column stats generated from impala needs to be deleted for alter command > to work. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HIVE-28085) YarnQueueHelper fails to access HTTPS enabled YARN WebService
Prabhu Joseph created HIVE-28085: Summary: YarnQueueHelper fails to access HTTPS enabled YARN WebService Key: HIVE-28085 URL: https://issues.apache.org/jira/browse/HIVE-28085 Project: Hive Issue Type: Bug Affects Versions: 4.0.0-alpha-1 Reporter: Prabhu Joseph YarnQueueHelper fails to access HTTPS enabled YARN WebService with below error. {code:java} Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:148) ~[?:1.8.0_402] at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:129) ~[?:1.8.0_402] at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280) ~[?:1.8.0_402] at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:451) ~[?:1.8.0_402] at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:323) ~[?:1.8.0_402] at sun.security.validator.Validator.validate(Validator.java:271) ~[?:1.8.0_402] at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:315) ~[?:1.8.0_402] at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:223) ~[?:1.8.0_402] at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:129) ~[?:1.8.0_402] at sun.security.ssl.CertificateMessage$T13CertificateConsumer.checkServerCerts(CertificateMessage.java:1340) ~[?:1.8.0_402] at sun.security.ssl.CertificateMessage$T13CertificateConsumer.onConsumeCertificate(CertificateMessage.java:1231) ~[?:1.8.0_402] at sun.security.ssl.CertificateMessage$T13CertificateConsumer.consume(CertificateMessage.java:1174) ~[?:1.8.0_402] at sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:377) ~[?:1.8.0_402] at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:444) ~[?:1.8.0_402] at sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:422) ~[?:1.8.0_402] at sun.security.ssl.TransportContext.dispatch(TransportContext.java:182) ~[?:1.8.0_402] at sun.security.ssl.SSLTransport.decode(SSLTransport.java:152) ~[?:1.8.0_402] at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1401) ~[?:1.8.0_402] at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1309) ~[?:1.8.0_402] at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:440) ~[?:1.8.0_402] at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) ~[?:1.8.0_402] at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:197) ~[?:1.8.0_402] at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1572) ~[?:1.8.0_402] at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500) ~[?:1.8.0_402] at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) ~[?:1.8.0_402] at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352) ~[?:1.8.0_402] at org.apache.hadoop.hive.ql.exec.tez.YarnQueueHelper.checkQueueAccessFromSingleRm(YarnQueueHelper.java:155) ~[hive-exec-3.1.3-amzn-8.jar:3.1.3-amzn-8] {code} The fix (HIVE-23062) has corrected the URL but has not used the SSLFactory to setup the HTTPURlConnection. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-22738) CVE-2019-0205
[ https://issues.apache.org/jira/browse/HIVE-22738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819320#comment-17819320 ] Dongjoon Hyun commented on HIVE-22738: -- For the record, THRIFT-4053 fixed Java implementation for CVE-2019-0205 at 0.11.0. > CVE-2019-0205 > - > > Key: HIVE-22738 > URL: https://issues.apache.org/jira/browse/HIVE-22738 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Laurent Goujon >Priority: Major > > There's has been a CVE issued for a Thrift vulnerability which might impact > Hive. The CVE is > [CVE-2019-0205|https://nvd.nist.gov/vuln/detail/CVE-2019-0205], impacts both > clients and servers, and might cause a denial of service through an infinite > loop. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28064) Add cause to ParseException for diagnosability purposes
[ https://issues.apache.org/jira/browse/HIVE-28064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-28064. Fix Version/s: 4.1.0 Resolution: Fixed Fixed in https://github.com/apache/hive/commit/6e061e6559522c8a060c1b55439ada0001bf5e5d. Thanks for the reviews [~okumin] [~zhangbutao]! > Add cause to ParseException for diagnosability purposes > --- > > Key: HIVE-28064 > URL: https://issues.apache.org/jira/browse/HIVE-28064 > Project: Hive > Issue Type: Improvement > Components: Parser >Reporter: Stamatis Zampetakis >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > The > [ParseException|https://github.com/apache/hive/blob/4b01a607091581ac9bdb372f8b47c1efca4d4bb4/parser/src/java/org/apache/hadoop/hive/ql/parse/ParseException.java] > contains high level information about problems encountered during parsing > but currently the stacktrace is pretty shallow. > {code:sql} > select * from author where true > fname in ('Alex','Rob') > {code} > Currently the query above will fail with the above stacktrace: > {noformat} > org.apache.hadoop.hive.ql.parse.ParseException: line 4:41 cannot recognize > input near 'in' '(' ''Alex'' in expression specification > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) > at > org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:97) > at > org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:89) > at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:172) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) > {noformat} > The end-user gets a hint about what the error might be but the developer has > no way to tell how far we went into parsing the given statement and which > grammar rule failed to pass. > In some cases we are catching the RecognitionException which shows exactly at > which point the parser failed to advance so we could pass it as a cause > inside the ParseException. With this change the stacktrace would be similar > to the one below: > {noformat} > org.apache.hadoop.hive.ql.parse.ParseException: line 4:41 cannot recognize > input near 'in' '(' ''Alex'' in expression specification > at > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:125) > at > org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:97) > at > org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:89) > at org.apache.hadoop.hive.ql.Compiler.parse(Compiler.java:172) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:105) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:519) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:471) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:436) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:430) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:121) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) > at > org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) > Caused by: NoViableAltException(173@[()* loopback of 739:5: (equal= > precedenceEqualOperator p= precedenceSimilarExpression -> ^( $equal $p) > |dist= precedenceDistinctOperator p=
[jira] [Commented] (HIVE-27598) Enhance alter table compact to work for partitioned tables without specifying any partition
[ https://issues.apache.org/jira/browse/HIVE-27598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819281#comment-17819281 ] Stamatis Zampetakis commented on HIVE-27598: I have been reviewing HIVE-27848 and noticed that Initiator was split into multiple classes as part of this ticket. I understand that the idea was to reuse code that was already there but the current design that relies on inheritance is not ideal. Due to inheritance the {{InitiatorBase}} class becomes a Thread something that doesn't really make sense and it comes with additional overhead every time we instantiate it. Moreover, the only class that currently extends {{InitiatorBase}} is the Initiator and I hardly see how we can make other extensions from {{InitiatorBase}}; the code becomes complex and any subtle change in {{InitiatorBase}} may have unpredictable effects on {{Initiator}}. Having a "Base" class that is not really meant to be extended and no instructions on how to do so is a weird design pattern. If someone finds time it would be nice to revisit this decision and remove inheritance and {{InitiatorBase}} from the picture. > Enhance alter table compact to work for partitioned tables without specifying > any partition > --- > > Key: HIVE-27598 > URL: https://issues.apache.org/jira/browse/HIVE-27598 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Taraka Rama Rao Lethavadla >Assignee: Taraka Rama Rao Lethavadla >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-beta-1 > > > *current implementation:* the below command will throw error > > {noformat} > alter table PARTITIONED_T compact 'minor';{noformat} > for a partitioned table > {code:java} > You must specify a partition to compact for partitioned tables{code} > {*}Problem{*}: One of the problem could be ,let's say a table contains 1000s > of partitions and for some issue like files permission automatic compaction > will mark all or few partitions as failed and never pick for compaction. > After we see performance getting degraded, we find out that there is a > problem in compaction due to some issue. > Now to solve it, we have two approaches, adjust failure thresholds and > restart HMS to pick the changes which is not going to work for someone who > cannot restart HMS > And the second approach is to execute alter table mentioning each partition > {*}Solution{*}: > In this Jira we are planning to allow alter table command for a partitioned > table without specifying any partition name. > Proposed solution is to list all the partitions and find partitions that are > eligible to get compacted and put an entry for it in compaction queue -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28081) Code refine on ClearDanglingScratchDir::removeLocalTmpFiles
[ https://issues.apache.org/jira/browse/HIVE-28081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang resolved HIVE-28081. Fix Version/s: 4.1.0 Resolution: Fixed Merged into master branch. Thanks [~okumin] & [~zabetak] for the review!!! > Code refine on ClearDanglingScratchDir::removeLocalTmpFiles > --- > > Key: HIVE-28081 > URL: https://issues.apache.org/jira/browse/HIVE-28081 > Project: Hive > Issue Type: Improvement >Reporter: Butao Zhang >Assignee: Butao Zhang >Priority: Trivial > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28015) Iceberg: Add identifier-field-ids support in Hive
[ https://issues.apache.org/jira/browse/HIVE-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Butao Zhang resolved HIVE-28015. Fix Version/s: 4.1.0 Resolution: Fixed Merged into master branch. Thanks [~dkuzmenko] for the review!!! > Iceberg: Add identifier-field-ids support in Hive > - > > Key: HIVE-28015 > URL: https://issues.apache.org/jira/browse/HIVE-28015 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0 >Reporter: Denys Kuzmenko >Assignee: Butao Zhang >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Some writer engines require primary keys on a table so that they can use them > for writing equality deletes (only the PK cols are written to the eq-delete > files). > Hive currently doesn't reject setting PKs for Iceberg tables, however, it > just ignores them. This succeeds: > {code:java} > create table ice_pk (i int, j int, primary key(i)) stored by iceberg; > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-28082) HiveAggregateReduceFunctionsRule could generate an inconsistent result
[ https://issues.apache.org/jira/browse/HIVE-28082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] okumin updated HIVE-28082: -- Status: Patch Available (was: Open) I created a PR. https://github.com/apache/hive/pull/5091 > HiveAggregateReduceFunctionsRule could generate an inconsistent result > -- > > Key: HIVE-28082 > URL: https://issues.apache.org/jira/browse/HIVE-28082 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 4.0.0-beta-1 >Reporter: okumin >Assignee: okumin >Priority: Major > > HiveAggregateReduceFunctionsRule translates AVG, STDDEV_POP, STDDEV_SAMP, > VAR_POP, and VAR_SAMP. Those UDFs accept string types and try to decode them > as floating point values. It is possible that undecodable values exist. > We found that it could cause inconsistent behaviors with or without CBO. > {code:java} > 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text'); > ... > +--+ > | _c0 | > +--+ > | 0.0 | > +--+ > 1 row selected (18.229 seconds) > 0: jdbc:hive2://hive-hiveserver2:1/defaul> set hive.cbo.enable=false; > No rows affected (0.013 seconds) > 0: jdbc:hive2://hive-hiveserver2:1/defaul> SELECT AVG('text'); > ... > +---+ > | _c0 | > +---+ > | NULL | > +---+ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819172#comment-17819172 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:27 AM: --- That's strange. Our execution plans are different? you: one map task me: one map -> one reduce {code:java} -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 26.27 s -- INFO : Status: DAG finished successfully in 26.06 seconds INFO : DAG ID: dag_1706163520799_38211_2 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 0.00s INFO : Prepare Plan 0.00s INFO : Get Query Coordinator (AM) 0.00s INFO : Submit Plan 1708509885.24s INFO : Start DAG 0.04s INFO : Run DAG 26.05s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 3949.00 8,360 118 2 2 INFO : Reducer 2 0.00 820 0 2 0 INFO : -- INFO : INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_SUCCEEDED_TASKS: 2 INFO : TOTAL_LAUNCHED_TASKS: 2 INFO : RACK_LOCAL_TASKS: 1 INFO : AM_CPU_MILLISECONDS: 2650 INFO : WALL_CLOCK_MILLIS: 3912 INFO : AM_GC_TIME_MILLIS: 0 INFO : INITIAL_HELD_CONTAINERS: 0 INFO : TOTAL_CONTAINERS_USED: 1 INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 1 INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1 INFO : TOTAL_CONTAINER_REUSE_COUNT: 1 INFO : File System Counters: INFO : FILE_BYTES_READ: 0 INFO : FILE_BYTES_WRITTEN: 0 INFO : FILE_READ_OPS: 0 INFO : FILE_LARGE_READ_OPS: 0 INFO : FILE_WRITE_OPS: 0 INFO : HDFS_BYTES_READ: 1210 INFO : HDFS_BYTES_WRITTEN: 121 INFO : HDFS_READ_OPS: 8 INFO : HDFS_LARGE_READ_OPS: 0 INFO : HDFS_WRITE_OPS: 2 INFO : org.apache.tez.common.counters.TaskCounter: INFO : SPILLED_RECORDS: 0 INFO : NUM_SHUFFLED_INPUTS: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : GC_TIME_MILLIS: 118 INFO : CPU_MILLISECONDS: 9180 INFO : WALL_CLOCK_MILLISECONDS: 3726 INFO : PHYSICAL_MEMORY_BYTES: 3435134976 INFO : VIRTUAL_MEMORY_BYTES: 7564333056 INFO : COMMITTED_HEAP_BYTES: 3435134976 INFO : INPUT_RECORDS_PROCESSED: 4 INFO : INPUT_SPLIT_LENGTH_BYTES: 880 INFO : OUTPUT_RECORDS: 2 INFO : APPROXIMATE_INPUT_RECORDS: 2 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_BYTES: 8 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18 INFO : OUTPUT_BYTES_PHYSICAL: 46 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : SHUFFLE_BYTES: 0 INFO : SHUFFLE_BYTES_DECOMPRESSED: 0 INFO : SHUFFLE_BYTES_TO_MEM: 0 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_DISK_DIRECT: 0 INFO : SHUFFLE_PHASE_TIME: 58 INFO : FIRST_EVENT_RECEIVED: 58 INFO : LAST_EVENT_RECEIVED: 58 INFO : DATA_BYTES_VIA_EVENT: 22 INFO : HIVE: INFO : CREATED_FILES: 1 INFO : DESERIALIZE_ERRORS: 0 INFO : RECORDS_IN_Map_1: 2 INFO : RECORDS_OUT_0: 2 INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0 INFO : RECORDS_OUT_OPERATOR_FS_13: 2 INFO : RECORDS_OUT_OPERATOR_LIM_11: 2 INFO : RECORDS_OUT_OPERATOR_LIM_8: 2 INFO : RECORDS_OUT_OPERATOR_MAP_0: 0 INFO :
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819172#comment-17819172 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:26 AM: --- That's strange. Our execution plans are different? {code:java} -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 Reducer 2 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 02/02 [==>>] 100% ELAPSED TIME: 26.27 s -- INFO : Status: DAG finished successfully in 26.06 seconds INFO : DAG ID: dag_1706163520799_38211_2 INFO : INFO : Query Execution Summary INFO : -- INFO : OPERATION DURATION INFO : -- INFO : Compile Query 0.00s INFO : Prepare Plan 0.00s INFO : Get Query Coordinator (AM) 0.00s INFO : Submit Plan 1708509885.24s INFO : Start DAG 0.04s INFO : Run DAG 26.05s INFO : -- INFO : INFO : Task Execution Summary INFO : -- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : -- INFO : Map 1 3949.00 8,360 118 2 2 INFO : Reducer 2 0.00 820 0 2 0 INFO : -- INFO : INFO : org.apache.tez.common.counters.DAGCounter: INFO : NUM_SUCCEEDED_TASKS: 2 INFO : TOTAL_LAUNCHED_TASKS: 2 INFO : RACK_LOCAL_TASKS: 1 INFO : AM_CPU_MILLISECONDS: 2650 INFO : WALL_CLOCK_MILLIS: 3912 INFO : AM_GC_TIME_MILLIS: 0 INFO : INITIAL_HELD_CONTAINERS: 0 INFO : TOTAL_CONTAINERS_USED: 1 INFO : TOTAL_CONTAINER_ALLOCATION_COUNT: 1 INFO : TOTAL_CONTAINER_LAUNCH_COUNT: 1 INFO : TOTAL_CONTAINER_REUSE_COUNT: 1 INFO : File System Counters: INFO : FILE_BYTES_READ: 0 INFO : FILE_BYTES_WRITTEN: 0 INFO : FILE_READ_OPS: 0 INFO : FILE_LARGE_READ_OPS: 0 INFO : FILE_WRITE_OPS: 0 INFO : HDFS_BYTES_READ: 1210 INFO : HDFS_BYTES_WRITTEN: 121 INFO : HDFS_READ_OPS: 8 INFO : HDFS_LARGE_READ_OPS: 0 INFO : HDFS_WRITE_OPS: 2 INFO : org.apache.tez.common.counters.TaskCounter: INFO : SPILLED_RECORDS: 0 INFO : NUM_SHUFFLED_INPUTS: 0 INFO : NUM_FAILED_SHUFFLE_INPUTS: 0 INFO : GC_TIME_MILLIS: 118 INFO : CPU_MILLISECONDS: 9180 INFO : WALL_CLOCK_MILLISECONDS: 3726 INFO : PHYSICAL_MEMORY_BYTES: 3435134976 INFO : VIRTUAL_MEMORY_BYTES: 7564333056 INFO : COMMITTED_HEAP_BYTES: 3435134976 INFO : INPUT_RECORDS_PROCESSED: 4 INFO : INPUT_SPLIT_LENGTH_BYTES: 880 INFO : OUTPUT_RECORDS: 2 INFO : APPROXIMATE_INPUT_RECORDS: 2 INFO : OUTPUT_LARGE_RECORDS: 0 INFO : OUTPUT_BYTES: 8 INFO : OUTPUT_BYTES_WITH_OVERHEAD: 18 INFO : OUTPUT_BYTES_PHYSICAL: 46 INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0 INFO : ADDITIONAL_SPILLS_BYTES_READ: 0 INFO : ADDITIONAL_SPILL_COUNT: 0 INFO : SHUFFLE_BYTES: 0 INFO : SHUFFLE_BYTES_DECOMPRESSED: 0 INFO : SHUFFLE_BYTES_TO_MEM: 0 INFO : SHUFFLE_BYTES_TO_DISK: 0 INFO : SHUFFLE_BYTES_DISK_DIRECT: 0 INFO : SHUFFLE_PHASE_TIME: 58 INFO : FIRST_EVENT_RECEIVED: 58 INFO : LAST_EVENT_RECEIVED: 58 INFO : DATA_BYTES_VIA_EVENT: 22 INFO : HIVE: INFO : CREATED_FILES: 1 INFO : DESERIALIZE_ERRORS: 0 INFO : RECORDS_IN_Map_1: 2 INFO : RECORDS_OUT_0: 2 INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 2 INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0 INFO : RECORDS_OUT_OPERATOR_FS_13: 2 INFO : RECORDS_OUT_OPERATOR_LIM_11: 2 INFO : RECORDS_OUT_OPERATOR_LIM_8: 2 INFO : RECORDS_OUT_OPERATOR_MAP_0: 0 INFO : RECORDS_OUT_OPERATOR_RS_10: 2 INFO :
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819172#comment-17819172 ] yongzhi.shao commented on HIVE-27898: - That's strange. Our execution plans are different? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819169#comment-17819169 ] Butao Zhang commented on HIVE-27898: *I have tested the two case: task and no task, both are fine.* // set conf in beeline: set hive.fetch.task.conversion=none; //enforce launching tez task set hive.explain.user=false; {code:java} 0: jdbc:hive2://127.0.0.1:10004/default> select * from (select * from testdb.test_data_02 limit 10) s1 -- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED -- Map 1 .. container SUCCEEDED 1 1 0 0 0 0 -- VERTICES: 01/01 [==>>] 100% ELAPSED TIME: 21.55 s -- INFO : Completed executing command(queryId=hive_20240221181020_b4a5968f-3594-4b1a-bc24-1b968cf6a993); Time taken: 6.12 seconds ++--+ | s1.id | s1.name | ++--+ | 1 | a | | 2 | b | ++--+ 2 rows selected (6.442 seconds) 0: jdbc:hive2://127.0.0.1:1/default> explain select * from (select * from testdb.test_data_02 limit 10) s1; ++ | Explain | ++ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Tez | | DagId: hive_20240221181118_d4e74ca2-cc3b-426c-830a-ee32fd3801b4:4 | | DagName: hive_20240221181118_d4e74ca2-cc3b-426c-830a-ee32fd3801b4:4 | | Vertices: | | Map 1 | | Map Operator Tree: | | TableScan | | alias: test_data_02 | | Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE | | Limit | | Number of rows: 10 | | Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: id (type: string), name (type: string) | | outputColumnNames: _col0, _col1 | | Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE | | File Output Operator | | compressed: false | | Statistics: Num rows: 2 Data size: 736 Basic stats: COMPLETE Column stats: NONE | | table: | | input format: org.apache.hadoop.mapred.SequenceFileInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | Execution mode: vectorized | | | | Stage: Stage-0 | | Fetch Operator | | limit: 10 | | Processor Tree: | | ListSink | | | ++ 37 rows selected (0.294 seconds) {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 >
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:08 AM: --- Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the SQL-plan? here is mine: Explain Vertex dependency in root stage Explain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Explain Explain Stage-0 Explain Fetch Operator Explain limit:-1 Explain Stage-1 Explain Reducer 2 vectorized Explain File Output Operator [FS_13] Explain Select Operator [SEL_12] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_11] (rows=2 width=4430) Explain Number of rows:10 Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized Explain PARTITION_ONLY_SHUFFLE [RS_10] Explain Select Operator [SEL_9] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_8] (rows=2 width=4430) Explain Number of rows:10 Explain TableScan [TS_0] (rows=2 width=4430) Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"] Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; was (Author: lisoda): Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the implementation plan? here is mine: Explain Vertex dependency in root stage Explain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Explain Explain Stage-0 Explain Fetch Operator Explain limit:-1 Explain Stage-1 Explain Reducer 2 vectorized Explain File Output Operator [FS_13] Explain Select Operator [SEL_12] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_11] (rows=2 width=4430) Explain Number of rows:10 Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized Explain PARTITION_ONLY_SHUFFLE [RS_10] Explain Select Operator [SEL_9] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_8] (rows=2 width=4430) Explain Number of rows:10 Explain TableScan [TS_0] (rows=2 width=4430) Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"] Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:07 AM: --- Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the implementation plan? here is mine: Explain Vertex dependency in root stage Explain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Explain Explain Stage-0 Explain Fetch Operator Explain limit:-1 Explain Stage-1 Explain Reducer 2 vectorized Explain File Output Operator [FS_13] Explain Select Operator [SEL_12] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_11] (rows=2 width=4430) Explain Number of rows:10 Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized Explain PARTITION_ONLY_SHUFFLE [RS_10] Explain Select Operator [SEL_9] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_8] (rows=2 width=4430) Explain Number of rows:10 Explain TableScan [TS_0] (rows=2 width=4430) Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"] Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; was (Author: lisoda): Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the implementation plan? here is mine: Explain Vertex dependency in root stage Explain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Explain Explain Stage-0 Explain Fetch Operator Explain limit:-1 Explain Stage-1 Explain Reducer 2 vectorized Explain File Output Operator [FS_13] Explain Select Operator [SEL_12] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_11] (rows=2 width=4430) Explain Number of rows:10 Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized Explain PARTITION_ONLY_SHUFFLE [RS_10] Explain Select Operator [SEL_9] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_8] (rows=2 width=4430) Explain Number of rows:10 Explain TableScan [TS_0] (rows=2 width=4430) Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"] Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; {code:java} {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE,
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:07 AM: --- Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the implementation plan? here is mine: Explain Vertex dependency in root stage Explain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Explain Explain Stage-0 Explain Fetch Operator Explain limit:-1 Explain Stage-1 Explain Reducer 2 vectorized Explain File Output Operator [FS_13] Explain Select Operator [SEL_12] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_11] (rows=2 width=4430) Explain Number of rows:10 Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized Explain PARTITION_ONLY_SHUFFLE [RS_10] Explain Select Operator [SEL_9] (rows=2 width=4430) Explain Output:["_col0","_col1"] Explain Limit [LIM_8] (rows=2 width=4430) Explain Number of rows:10 Explain TableScan [TS_0] (rows=2 width=4430) Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"] Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; {code:java} {code} was (Author: lisoda): Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the implementation plan? here is mine: {code:java} Explain Vertex dependency in root stageExplain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)Explain Explain Stage-0Explain Fetch OperatorExplain limit:-1Explain Stage-1Explain Reducer 2 vectorizedExplain File Output Operator [FS_13]Explain Select Operator [SEL_12] (rows=2 width=4430)Explain Output:["_col0","_col1"]Explain Limit [LIM_11] (rows=2 width=4430)Explain Number of rows:10Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorizedExplain PARTITION_ONLY_SHUFFLE [RS_10]Explain Select Operator [SEL_9] (rows=2 width=4430)Explain Output:["_col0","_col1"]Explain Limit [LIM_8] (rows=2 width=4430)Explain Number of rows:10Explain TableScan [TS_0] (rows=2 width=4430)Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, >
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 10:06 AM: --- Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; Also, can you share the implementation plan? here is mine: {code:java} Explain Vertex dependency in root stageExplain Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)Explain Explain Stage-0Explain Fetch OperatorExplain limit:-1Explain Stage-1Explain Reducer 2 vectorizedExplain File Output Operator [FS_13]Explain Select Operator [SEL_12] (rows=2 width=4430)Explain Output:["_col0","_col1"]Explain Limit [LIM_11] (rows=2 width=4430)Explain Number of rows:10Explain <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorizedExplain PARTITION_ONLY_SHUFFLE [RS_10]Explain Select Operator [SEL_9] (rows=2 width=4430)Explain Output:["_col0","_col1"]Explain Limit [LIM_8] (rows=2 width=4430)Explain Number of rows:10Explain TableScan [TS_0] (rows=2 width=4430)Explain iceberg_dwd@test_data_02,test_data_02,Tbl:COMPLETE,Col:NONE,Output:["id","name"]Explain 22 rows selected (0.979 seconds) 0: jdbc:hive2://smaster01:1> explain select * from (select * from iceberg_dwd.test_data_02 limit 10) s1; {code} was (Author: lisoda): Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819167#comment-17819167 ] yongzhi.shao commented on HIVE-27898: - Did HIVE start a TEZ task when you executed this SQL? select * from (select * from testdb.test_data_02 limit 10) s1; > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819165#comment-17819165 ] Butao Zhang commented on HIVE-27898: I just did the test again base on master branch, and everything is ok, like what i did the test https://issues.apache.org/jira/browse/HIVE-27898?focusedCommentId=17790407=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17790407 . I think branch-4.0 is also fine. I am not sure why you still have this issue. Maybe you can try master branch again? {code:java} 0: jdbc:hive2://127.0.0.1:1/default> select name from (select * from testdb.test_data_02 limit 10) s1; +---+ | name | +---+ | a | | b | +---+ 2 rows selected (0.382 seconds) 0: jdbc:hive2://127.0.0.1:1/default> select id from (select * from testdb.test_data_02 limit 10) s1; +-+ | id | +-+ | 1 | | 2 | +-+ 2 rows selected (0.372 seconds) 0: jdbc:hive2://127.0.0.1:1/default> select * from testdb.test_data_02 limit 10; +--++ | test_data_02.id | test_data_02.name | +--++ | 1 | a | | 2 | b | +--++ 2 rows selected (0.355 seconds) 0: jdbc:hive2://127.0.0.1:1/default> select * from (select * from testdb.test_data_02 limit 10) s1; ++--+ | s1.id | s1.name | ++--+ | 1 | a | | 2 | b | ++--+ 2 rows selected (0.346 seconds){code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from
[jira] [Created] (HIVE-28084) Iceberg: COW fix for Merge operation
Denys Kuzmenko created HIVE-28084: - Summary: Iceberg: COW fix for Merge operation Key: HIVE-28084 URL: https://issues.apache.org/jira/browse/HIVE-28084 Project: Hive Issue Type: Bug Reporter: Denys Kuzmenko -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:49 AM: -- [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select name from (select * from iceberg_dwd.test_data_02 limit 10) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} was (Author: lisoda): [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select name from (select * from iceberg_dwd.test_data_02 limit 10) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, >
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819157#comment-17819157 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:47 AM: -- A more interesting phenomenon: {code:java} test_data_02.id 1 test_data_02.name a test_data_02.id 2 test_data_02.name b jdbc:hive2://xxx:1> select * from iceberg_dwd.test_data_02 limit 10; s1.id NULL s1.name a s1.id NULL s1.name b jdbc:hive2://xx:1> select * from (select * from iceberg_dwd.test_data_02 limit 10) s1;{code} Where did the data in the ID column go .? was (Author: lisoda): A more interesting phenomenon: {code:java} test_data_02.id 1 test_data_02.name a test_data_02.id 2 test_data_02.name b jdbc:hive2://smaster01:1> select * from iceberg_dwd.test_data_02 limit 10; s1.id NULL s1.name a s1.id NULL s1.name b jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_02 limit 10) s1;{code} Where did the data in the ID column go .? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code}
[jira] [Commented] (HIVE-28071) Sync jetty version across modules
[ https://issues.apache.org/jira/browse/HIVE-28071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819159#comment-17819159 ] Ayush Saxena commented on HIVE-28071: - Committed to master. Thanx [~Aggarwal_Raghav] for the contribution!!! > Sync jetty version across modules > - > > Key: HIVE-28071 > URL: https://issues.apache.org/jira/browse/HIVE-28071 > Project: Hive > Issue Type: Improvement >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Minor > Labels: pull-request-available > > In pom.xml, we have jetty version 9.4.45.v20220203 but in > stanalone-metastore/pom.xml and itests/qtest-druid/pom.xml we have jetty > version 9.4.40.v20210413. > Looks like it was skipped during HIVE-27757 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-28071) Sync jetty version across modules
[ https://issues.apache.org/jira/browse/HIVE-28071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-28071. - Fix Version/s: 4.1.0 Resolution: Fixed > Sync jetty version across modules > - > > Key: HIVE-28071 > URL: https://issues.apache.org/jira/browse/HIVE-28071 > Project: Hive > Issue Type: Improvement >Reporter: Raghav Aggarwal >Assignee: Raghav Aggarwal >Priority: Minor > Labels: pull-request-available > Fix For: 4.1.0 > > > In pom.xml, we have jetty version 9.4.45.v20220203 but in > stanalone-metastore/pom.xml and itests/qtest-druid/pom.xml we have jetty > version 9.4.40.v20210413. > Looks like it was skipped during HIVE-27757 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819157#comment-17819157 ] yongzhi.shao commented on HIVE-27898: - A more interesting phenomenon: {code:java} test_data_02.id 1 test_data_02.name a test_data_02.id 2 test_data_02.name b jdbc:hive2://smaster01:1> select * from iceberg_dwd.test_data_02 limit 10; s1.id NULL s1.name a s1.id NULL s1.name b jdbc:hive2://smaster01:1> select * from (select * from iceberg_dwd.test_data_02 limit 10) s1;{code} Where did the data in the ID column go .? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819152#comment-17819152 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:36 AM: -- [~zhangbutao] Hello. However, I'm now testing with the 4.0 branch I just compiled, and most of the problems are indeed fixed, but now there is only this SQL problem. {code:java} 0: jdbc:hive2://x:1> select version(); _c0 4.0.0-beta-2-SNAPSHOT rf355c82a5aa77ef1496b35c22b8ac9b84dfe1780 select name from (select * from iceberg_dwd.test_data_02 limit 10) s1; – 2 row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null{code} Can't I filter a column named ID from the results of a subquery? was (Author: lisoda): [~zhangbutao] Hello. However, I'm now testing with the 4.0 branch I just compiled, and most of the problems are indeed fixed, but now there is only this SQL problem. {code:java} 0: jdbc:hive2://x:1> select version(); _c0 4.0.0-beta-2-SNAPSHOT rf355c82a5aa77ef1496b35c22b8ac9b84dfe1780 select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null{code} Can't I filter a column named ID from the results of a subquery? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id =
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819152#comment-17819152 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:32 AM: -- [~zhangbutao] Hello. However, I'm now testing with the 4.0 branch I just compiled, and most of the problems are indeed fixed, but now there is only this SQL problem. {code:java} 0: jdbc:hive2://x:1> select version(); _c0 4.0.0-beta-2-SNAPSHOT rf355c82a5aa77ef1496b35c22b8ac9b84dfe1780 select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null{code} Can't I filter a column named ID from the results of a subquery? was (Author: lisoda): [~zhangbutao] Hello. However, I'm now testing with the 4.0 branch I just compiled, and most of the problems are indeed fixed, but now there is only this SQL problem. {code:java} select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null{code} Can't I filter a column named ID from the results of a subquery? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819152#comment-17819152 ] yongzhi.shao commented on HIVE-27898: - [~zhangbutao] Hello. However, I'm now testing with the 4.0 branch I just compiled, and most of the problems are indeed fixed, but now there is only this SQL problem. {code:java} select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null{code} Can't I filter a column named ID from the results of a subquery? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:27 AM: -- [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select name from (select * from iceberg_dwd.test_data_02 limit 10) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} was (Author: lisoda): [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE,
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:25 AM: -- [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} was (Author: lisoda): [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch,) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, >
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819150#comment-17819150 ] Butao Zhang commented on HIVE-27898: https://issues.apache.org/jira/browse/HIVE-27898?focusedCommentId=17790407=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17790407 branch-4.0 has all important fixes. Based the test i did before, i don't think branch-4.0 has this issue. > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:25 AM: -- [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? {code:java} spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch,) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; – all row is null {code} was (Author: lisoda): [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch,) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; -- all row is null > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:23 AM: -- [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to HIVE-27945 Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? spark3.4.1+iceberg 1.4.3 CREATE TABLE IF NOT EXISTS datacenter.test.test_data_02 ( id string,name string ) PARTITIONED BY (name) TBLPROPERTIES ('read.orc.vectorization.enabled'='true','write.format.default'='orc','write.orc.bloom.filter.columns'='id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true') STORED AS iceberg; insert into datacenter.test.test_data_02(id,name) values('1','a'),('2','b'); --hive 4.0.0-beta-2 (4.0 branch,) CREATE EXTERNAL TABLE iceberg_dwd.test_data_02 STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'hdfs://xx/iceberg-catalog/warehouse/test/test_data_02'TBLPROPERTIES ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); select id from (select * from iceberg_dwd.test_data_02) s1; - 2row select id from (select * from iceberg_dwd.test_data_02 limit 10) s1; -- all row is null was (Author: lisoda): [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to [HIVE-27945] Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION
[jira] [Comment Edited] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao edited comment on HIVE-27898 at 2/21/24 9:20 AM: -- [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? Also, do we need to add this issue to [HIVE-27945] Add 4.0 known issues into TSB - ASF JIRA (apache.org) if we can't backport the patch associated with 4.0.0 when it is released? was (Author: lisoda): [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-27898) HIVE4 can't use ICEBERG table in subqueries
[ https://issues.apache.org/jira/browse/HIVE-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819145#comment-17819145 ] yongzhi.shao commented on HIVE-27898: - [~zhangbutao] : hello.Now I re-tested with branch-4.0 code and the problem still exists. Do I need to test with the master branch? > HIVE4 can't use ICEBERG table in subqueries > --- > > Key: HIVE-27898 > URL: https://issues.apache.org/jira/browse/HIVE-27898 > Project: Hive > Issue Type: Improvement > Components: Iceberg integration >Affects Versions: 4.0.0-beta-1 >Reporter: yongzhi.shao >Priority: Critical > > Currently, we found that when using HIVE4-BETA1 version, if we use ICEBERG > table in the subquery, we can't get any data in the end. > I have used HIVE3-TEZ for cross validation and HIVE3 does not have this > problem when querying ICEBERG. > {code:java} > --spark3.4.1+iceberg 1.4.2 > CREATE TABLE datacenter.dwd.b_std_trade ( > uni_order_id STRING, > data_from BIGINT, > partner STRING, > plat_code STRING, > order_id STRING, > uni_shop_id STRING, > uni_id STRING, > guide_id STRING, > shop_id STRING, > plat_account STRING, > total_fee DOUBLE, > item_discount_fee DOUBLE, > trade_discount_fee DOUBLE, > adjust_fee DOUBLE, > post_fee DOUBLE, > discount_rate DOUBLE, > payment_no_postfee DOUBLE, > payment DOUBLE, > pay_time STRING, > product_num BIGINT, > order_status STRING, > is_refund STRING, > refund_fee DOUBLE, > insert_time STRING, > created STRING, > endtime STRING, > modified STRING, > trade_type STRING, > receiver_name STRING, > receiver_country STRING, > receiver_state STRING, > receiver_city STRING, > receiver_district STRING, > receiver_town STRING, > receiver_address STRING, > receiver_mobile STRING, > trade_source STRING, > delivery_type STRING, > consign_time STRING, > orders_num BIGINT, > is_presale BIGINT, > presale_status STRING, > first_fee_paytime STRING, > last_fee_paytime STRING, > first_paid_fee DOUBLE, > tenant STRING, > tidb_modified STRING, > step_paid_fee DOUBLE, > seller_flag STRING, > is_used_store_card BIGINT, > store_card_used DOUBLE, > store_card_basic_used DOUBLE, > store_card_expand_used DOUBLE, > order_promotion_num BIGINT, > item_promotion_num BIGINT, > buyer_remark STRING, > seller_remark STRING, > trade_business_type STRING) > USING iceberg > PARTITIONED BY (uni_shop_id, truncate(4, created)) > LOCATION '/iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES ( > 'current-snapshot-id' = '7217819472703702905', > 'format' = 'iceberg/orc', > 'format-version' = '1', > 'hive.stored-as' = 'iceberg', > 'read.orc.vectorization.enabled' = 'true', > 'sort-order' = 'uni_shop_id ASC NULLS FIRST, created ASC NULLS FIRST', > 'write.distribution-mode' = 'hash', > 'write.format.default' = 'orc', > 'write.metadata.delete-after-commit.enabled' = 'true', > 'write.metadata.previous-versions-max' = '3', > 'write.orc.bloom.filter.columns' = 'order_id', > 'write.orc.compression-codec' = 'zstd') > --hive-iceberg > CREATE EXTERNAL TABLE iceberg_dwd.b_std_trade > STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' > LOCATION 'hdfs:///iceberg-catalog/warehouse/dwd/b_std_trade' > TBLPROPERTIES > ('iceberg.catalog'='location_based_table','engine.hive.enabled'='true'); > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 --10 rows > select * > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --10 rows > select uni_shop_id > from ( > select * from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > select uni_shop_id > from ( > select uni_shop_id as uni_shop_id from iceberg_dwd.b_std_trade > where uni_shop_id = 'TEST|1' limit 10 > ) t1; --0 rows > --hive-orc > select uni_shop_id > from ( > select * from iceberg_dwd.trade_test > where uni_shop_id = 'TEST|1' limit 10 > ) t1;--10 ROWS{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)