[jira] [Created] (HIVE-23165) Hive On Spark left join and right join generated inconsistent data
qingfa zhou created HIVE-23165: -- Summary: Hive On Spark left join and right join generated inconsistent data Key: HIVE-23165 URL: https://issues.apache.org/jira/browse/HIVE-23165 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.2.0 Environment: hive :2.3.0 spark:2.2.0 hadoop:2.7.3 Reporter: qingfa zhou Assignee: Xuefu Zhang *1)This is my sql.* with delivery_day as ( select * from ( select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order from data_smartorder.dm_ordering_information_system_order_detail_parse t lateral view json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order where dt=date_format(date_sub(current_date,1),'MMdd') and l1_category_name='策略配置' and l2_category_name='pb仓库补货仓品维度新' and b.is_pici_order='1' )t ), avg_sale_7 as ( select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7 from ( select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name sku_main_name, sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity, sum(t1.sku_sale_quantity) as sku_sale_quantity, sum(t1.first_dilivery_quantity) as first_dilivery_quantity from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1 where t1.dt=date_format(date_sub(current_date,1),'MMdd') group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name )t ) select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code, t2.avg_sale_7 from delivery_day t1 left join avg_sale_7 t2 on t1.warehouse_code=t2.warehouse_code and t1.sku_main_code=t2.sku_main_code where t1.sku_main_code='37010832' and t1.out_warehouse_code='1011'; left join and right join generated inconsistent data. 2) result in the left join 7001 37010832 1011 26.8572 1011 37010832 1011 130.2858 2002 37010832 1011 40 1701 37010832 1011 NULL 3) result in the right join 1011 37010832 1011 65.1429 2002 37010832 1011 20 7001 37010832 1011 13.4286 Inconsistent results in last column,'right join' 's result is right.But the results of hive on tez and sparksql are consistent and is true. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22181) Same query but different result occasionally
qingfa zhou created HIVE-22181: -- Summary: Same query but different result occasionally Key: HIVE-22181 URL: https://issues.apache.org/jira/browse/HIVE-22181 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.3.0 Environment: hadoop:2.7.3.2.6.1.0-129 hive:2.3.0 tez:0.8.4 java:1.8.0_212 Reporter: qingfa zhou Attachments: WX20190910-123604.png, WX20190910-123621.png, WX20190910-123641.png h3. 1.Phenomenon Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively 1) OK 3951864 808 2) OK 1822979 353 But the first result is true. h3. 2.SQL select count(1),count(distinct store_code) from ( select store_code, all_pay_id, payable_price, row_number()over(partition by store_code order by payable_price desc) as rw, count(1)over(partition by store_code) as store_user from ( select store_code, all_pay_id, sum(payable_price) as payable_price from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a where dt='20190904' group by store_code,all_pay_id ) a order by rw ) a where rw/store_user<=0.8 ; h3. 3.Troubleshooting I lookup task result from tez web ui,give the result as follows: the first: !image-2019-09-10-12-16-45-385.png! the second: !image-2019-09-10-12-18-08-769.png! so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first. This is detailed info of the second result from reducer_3 to reducer_4 !image-2019-09-10-12-26-41-681.png! This is execution plan of the sql: OK Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) Reducer 5 <- Reducer 4 (SIMPLE_EDGE) Reducer 6 <- Reducer 5 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 6 File Output Operator [FS_24] Group By Operator [GBY_22] (rows=1 width=128) Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"] <-Reducer 5 [SIMPLE_EDGE] SHUFFLE [RS_21] Group By Operator [GBY_20] (rows=5619870 width=2328) Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0 Select Operator [SEL_17] (rows=5619870 width=2328) Output:["_col0"] <-Reducer 4 [SIMPLE_EDGE] SHUFFLE [RS_16] Select Operator [SEL_14] (rows=5619870 width=2328) Output:["_col0","_col3"] Filter Operator [FIL_25] (rows=5619870 width=2328) predicate:((_col0 / count_window_1) <= 0.8) PTF Operator [PTF_13] (rows=16859610 width=2328) Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}] Select Operator [SEL_12] (rows=16859610 width=2328) Output:["_col0","_col1"] <-Reducer 3 [SIMPLE_EDGE] SHUFFLE [RS_11] PartitionCols:_col0 Select Operator [SEL_10] (rows=16859610 width=2328) Output:["_col0","row_number_window_0"] PTF Operator [PTF_9] (rows=16859610 width=2328) Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}] Select Operator [SEL_8] (rows=16859610 width=2328) Output:["_col0","_col2"] <-Reducer 2 [SIMPLE_EDGE] SHUFFLE [RS_7] PartitionCols:_col0 Select Operator [SEL_6] (rows=16859610 width=2328) Output:["_col0","_col2"] Group By Operator [GBY_5] (rows=16859610 width=2328) Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1 <-Map 1 [SIMPLE_EDGE] SHUFFLE [RS_4] PartitionCols:_col0, _col1 Group By Operator [GBY_3] (rows=33719220 width=2328) Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id Select Operator [SEL_2] (rows=33719220 width=2328) Output:["store_code","all_pay_id","payable_price"] TableScan [TS_0] (rows=33719220 width=2328) data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"] Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HIVE-22180) Same query but different result occasionally
qingfa zhou created HIVE-22180: -- Summary: Same query but different result occasionally Key: HIVE-22180 URL: https://issues.apache.org/jira/browse/HIVE-22180 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.3.0 Environment: hadoop:2.7.3.2.6.1.0-129 hive:2.3.0 tez:0.8.4 java:1.8.0_212 Reporter: qingfa zhou Attachments: image-2019-09-10-12-16-45-385.png, image-2019-09-10-12-17-34-779.png, image-2019-09-10-12-18-08-769.png, image-2019-09-10-12-26-41-681.png h3. 1.Phenomenon Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively 1) OK 3951864 808 2) OK 1822979 353 But the first result is true. h3. 2.SQL select count(1),count(distinct store_code) from ( select store_code, all_pay_id, payable_price, row_number()over(partition by store_code order by payable_price desc) as rw, count(1)over(partition by store_code) as store_user from ( select store_code, all_pay_id, sum(payable_price) as payable_price from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a where dt='20190904' group by store_code,all_pay_id ) a order by rw ) a where rw/store_user<=0.8 ; h3. 3.Troubleshooting I lookup task result from tez web ui,give the result as follows: the first: !image-2019-09-10-12-16-45-385.png! the second: !image-2019-09-10-12-18-08-769.png! so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first. This is detailed info of the second result from reducer_3 to reducer_4 !image-2019-09-10-12-26-41-681.png! This is execution plan of the sql: OK Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) Reducer 5 <- Reducer 4 (SIMPLE_EDGE) Reducer 6 <- Reducer 5 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 6 File Output Operator [FS_24] Group By Operator [GBY_22] (rows=1 width=128) Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"] <-Reducer 5 [SIMPLE_EDGE] SHUFFLE [RS_21] Group By Operator [GBY_20] (rows=5619870 width=2328) Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0 Select Operator [SEL_17] (rows=5619870 width=2328) Output:["_col0"] <-Reducer 4 [SIMPLE_EDGE] SHUFFLE [RS_16] Select Operator [SEL_14] (rows=5619870 width=2328) Output:["_col0","_col3"] Filter Operator [FIL_25] (rows=5619870 width=2328) predicate:((_col0 / count_window_1) <= 0.8) PTF Operator [PTF_13] (rows=16859610 width=2328) Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}] Select Operator [SEL_12] (rows=16859610 width=2328) Output:["_col0","_col1"] <-Reducer 3 [SIMPLE_EDGE] SHUFFLE [RS_11] PartitionCols:_col0 Select Operator [SEL_10] (rows=16859610 width=2328) Output:["_col0","row_number_window_0"] PTF Operator [PTF_9] (rows=16859610 width=2328) Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}] Select Operator [SEL_8] (rows=16859610 width=2328) Output:["_col0","_col2"] <-Reducer 2 [SIMPLE_EDGE] SHUFFLE [RS_7] PartitionCols:_col0 Select Operator [SEL_6] (rows=16859610 width=2328) Output:["_col0","_col2"] Group By Operator [GBY_5] (rows=16859610 width=2328) Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1 <-Map 1 [SIMPLE_EDGE] SHUFFLE [RS_4] PartitionCols:_col0, _col1 Group By Operator [GBY_3] (rows=33719220 width=2328) Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id Select Operator [SEL_2] (rows=33719220 width=2328) Output:["store_code","all_pay_id","payable_price"] TableScan [TS_0] (rows=33719220 width=2328) data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"] Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me. -- This message was sent by Atlassian Jira (v8.3.2#803003)