from:"qingfa zhou \\\(Jira\\\)"

[jira] [Created] (HIVE-23165) Hive On Spark left join and right join generated inconsistent data

2020-04-09 Thread qingfa zhou (Jira)

qingfa zhou created HIVE-23165:
--

 Summary:  Hive On Spark left join and right join generated 
inconsistent data
 Key: HIVE-23165
 URL: https://issues.apache.org/jira/browse/HIVE-23165
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
 Environment: hive :2.3.0

spark:2.2.0

hadoop:2.7.3
Reporter: qingfa zhou
Assignee: Xuefu Zhang


*1)This is my sql.*

with delivery_day as (
 select * from (
 select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order
 from data_smartorder.dm_ordering_information_system_order_detail_parse t
 lateral view 
json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b 
as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order
 where dt=date_format(date_sub(current_date,1),'MMdd')
 and l1_category_name='策略配置'
 and l2_category_name='pb仓库补货仓品维度新'
 and b.is_pici_order='1'
 )t
),

avg_sale_7 as (
 select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7
 from (
 select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name 
sku_main_name,
 sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity,
 sum(t1.sku_sale_quantity) as sku_sale_quantity,
 sum(t1.first_dilivery_quantity) as first_dilivery_quantity
 from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1
 where t1.dt=date_format(date_sub(current_date,1),'MMdd')
 group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name
 )t
)

 select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code,
 t2.avg_sale_7
 from delivery_day t1
 left join avg_sale_7 t2
 on t1.warehouse_code=t2.warehouse_code
 and t1.sku_main_code=t2.sku_main_code
 where t1.sku_main_code='37010832'
 and t1.out_warehouse_code='1011';

left join and right join generated inconsistent data.

2) result in the left join 
7001  37010832  1011  26.8572
1011  37010832  1011  130.2858
2002  37010832  1011  40
1701  37010832  1011  NULL

3) result in the right join 
1011  37010832  1011  65.1429
2002  37010832  1011  20
7001  37010832  1011  13.4286

Inconsistent results in last column，'right join' 's result is right.But the 
results of hive on tez and sparksql are consistent and is true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HIVE-22181) Same query but different result occasionally

2019-09-09 Thread qingfa zhou (Jira)

qingfa zhou created HIVE-22181:
--

 Summary: Same query but different result occasionally
 Key: HIVE-22181
 URL: https://issues.apache.org/jira/browse/HIVE-22181
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.0
 Environment: hadoop：2.7.3.2.6.1.0-129

hive：2.3.0

tez：0.8.4

java：1.8.0_212 
Reporter: qingfa zhou
 Attachments: WX20190910-123604.png, WX20190910-123621.png, 
WX20190910-123641.png

h3. 1.Phenomenon

    Running the same SQL(hive on tez) several times will result in inconsistent 
results . The following results are generated respectively

    1)

     OK

     3951864 808

    2)

     OK

     1822979 353

    But the first result is true.
h3. 2.SQL

select
 count(1),count(distinct store_code)
 from 
 (
 select
 store_code,
 all_pay_id,
 payable_price,
 row_number()over(partition by store_code order by payable_price desc) as rw,
 count(1)over(partition by store_code) as store_user
 from 
 (
 select 
 store_code,
 all_pay_id,
 sum(payable_price) as payable_price
 from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
 where dt='20190904'
 group by store_code,all_pay_id
 ) a 
 order by rw 
 ) a 
 where rw/store_user<=0.8
 ;

 
h3. 3.Troubleshooting

    I lookup task result from tez web ui，give the result as follows：

           the first: !image-2019-09-10-12-16-45-385.png!

        the second:

!image-2019-09-10-12-18-08-769.png!

      so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower 
than the first.

     This is detailed info of the second result from reducer_3 to reducer_4 

!image-2019-09-10-12-26-41-681.png!  

This is execution plan of the sql:

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)

Stage-0
 Fetch Operator
 limit:-1
 Stage-1
 Reducer 6
 File Output Operator [FS_24]
 Group By Operator [GBY_22] (rows=1 width=128)
 Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
KEY._col0:0._col0)"]
 <-Reducer 5 [SIMPLE_EDGE]
 SHUFFLE [RS_21]
 Group By Operator [GBY_20] (rows=5619870 width=2328)
 Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT 
_col0)"],keys:_col0
 Select Operator [SEL_17] (rows=5619870 width=2328)
 Output:["_col0"]
 <-Reducer 4 [SIMPLE_EDGE]
 SHUFFLE [RS_16]
 Select Operator [SEL_14] (rows=5619870 width=2328)
 Output:["_col0","_col3"]
 Filter Operator [FIL_25] (rows=5619870 width=2328)
 predicate:((_col0 / count_window_1) <= 0.8)
 PTF Operator [PTF_13] (rows=16859610 width=2328)
 Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 
ASC NULLS FIRST","partition by:":"_col1"}]
 Select Operator [SEL_12] (rows=16859610 width=2328)
 Output:["_col0","_col1"]
 <-Reducer 3 [SIMPLE_EDGE]
 SHUFFLE [RS_11]
 PartitionCols:_col0
 Select Operator [SEL_10] (rows=16859610 width=2328)
 Output:["_col0","row_number_window_0"]
 PTF Operator [PTF_9] (rows=16859610 width=2328)
 Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 
DESC NULLS LAST","partition by:":"_col0"}]
 Select Operator [SEL_8] (rows=16859610 width=2328)
 Output:["_col0","_col2"]
 <-Reducer 2 [SIMPLE_EDGE]
 SHUFFLE [RS_7]
 PartitionCols:_col0
 Select Operator [SEL_6] (rows=16859610 width=2328)
 Output:["_col0","_col2"]
 Group By Operator [GBY_5] (rows=16859610 width=2328)
 
Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
 KEY._col1
 <-Map 1 [SIMPLE_EDGE]
 SHUFFLE [RS_4]
 PartitionCols:_col0, _col1
 Group By Operator [GBY_3] (rows=33719220 width=2328)
 
Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code,
 all_pay_id
 Select Operator [SEL_2] (rows=33719220 width=2328)
 Output:["store_code","all_pay_id","payable_price"]
 TableScan [TS_0] (rows=33719220 width=2328)
 
data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]

 

    Reducer 4 show PTF , so I don't know if this result has anything to do with 
PTF? Can you help me.

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-22180) Same query but different result occasionally

2019-09-09 Thread qingfa zhou (Jira)

qingfa zhou created HIVE-22180:
--

 Summary: Same query but different result occasionally
 Key: HIVE-22180
 URL: https://issues.apache.org/jira/browse/HIVE-22180
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.3.0
 Environment: hadoop：2.7.3.2.6.1.0-129

hive：2.3.0

tez：0.8.4

java：1.8.0_212 
Reporter: qingfa zhou
 Attachments: image-2019-09-10-12-16-45-385.png, 
image-2019-09-10-12-17-34-779.png, image-2019-09-10-12-18-08-769.png, 
image-2019-09-10-12-26-41-681.png

h3. 1.Phenomenon

    Running the same SQL(hive on tez) several times will result in inconsistent 
results . The following results are generated respectively

    1)

     OK

     3951864 808

    2)

     OK

     1822979 353

    But the first result is true.
h3. 2.SQL

select
 count(1),count(distinct store_code)
 from 
 (
 select
 store_code,
 all_pay_id,
 payable_price,
 row_number()over(partition by store_code order by payable_price desc) as rw,
 count(1)over(partition by store_code) as store_user
 from 
 (
 select 
 store_code,
 all_pay_id,
 sum(payable_price) as payable_price
 from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a
 where dt='20190904'
 group by store_code,all_pay_id
 ) a 
 order by rw 
 ) a 
 where rw/store_user<=0.8
 ;

 
h3. 3.Troubleshooting

    I lookup task result from tez web ui，give the result as follows：

           the first: !image-2019-09-10-12-16-45-385.png!

        the second:

!image-2019-09-10-12-18-08-769.png!

      so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower 
than the first.

     This is detailed info of the second result from reducer_3 to reducer_4 

!image-2019-09-10-12-26-41-681.png!  

This is execution plan of the sql:

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)

Stage-0
 Fetch Operator
 limit:-1
 Stage-1
 Reducer 6
 File Output Operator [FS_24]
 Group By Operator [GBY_22] (rows=1 width=128)
 Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT 
KEY._col0:0._col0)"]
 <-Reducer 5 [SIMPLE_EDGE]
 SHUFFLE [RS_21]
 Group By Operator [GBY_20] (rows=5619870 width=2328)
 Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT 
_col0)"],keys:_col0
 Select Operator [SEL_17] (rows=5619870 width=2328)
 Output:["_col0"]
 <-Reducer 4 [SIMPLE_EDGE]
 SHUFFLE [RS_16]
 Select Operator [SEL_14] (rows=5619870 width=2328)
 Output:["_col0","_col3"]
 Filter Operator [FIL_25] (rows=5619870 width=2328)
 predicate:((_col0 / count_window_1) <= 0.8)
 PTF Operator [PTF_13] (rows=16859610 width=2328)
 Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 
ASC NULLS FIRST","partition by:":"_col1"}]
 Select Operator [SEL_12] (rows=16859610 width=2328)
 Output:["_col0","_col1"]
 <-Reducer 3 [SIMPLE_EDGE]
 SHUFFLE [RS_11]
 PartitionCols:_col0
 Select Operator [SEL_10] (rows=16859610 width=2328)
 Output:["_col0","row_number_window_0"]
 PTF Operator [PTF_9] (rows=16859610 width=2328)
 Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 
DESC NULLS LAST","partition by:":"_col0"}]
 Select Operator [SEL_8] (rows=16859610 width=2328)
 Output:["_col0","_col2"]
 <-Reducer 2 [SIMPLE_EDGE]
 SHUFFLE [RS_7]
 PartitionCols:_col0
 Select Operator [SEL_6] (rows=16859610 width=2328)
 Output:["_col0","_col2"]
 Group By Operator [GBY_5] (rows=16859610 width=2328)
 
Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0,
 KEY._col1
 <-Map 1 [SIMPLE_EDGE]
 SHUFFLE [RS_4]
 PartitionCols:_col0, _col1
 Group By Operator [GBY_3] (rows=33719220 width=2328)
 
Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code,
 all_pay_id
 Select Operator [SEL_2] (rows=33719220 width=2328)
 Output:["store_code","all_pay_id","payable_price"]
 TableScan [TS_0] (rows=33719220 width=2328)
 
data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"]

 

    Reducer 4 show PTF , so I don't know if this result has anything to do with 
PTF? Can you help me.

 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (HIVE-23165) Hive On Spark left join and right join generated inconsistent data

[jira] [Created] (HIVE-22181) Same query but different result occasionally

[jira] [Created] (HIVE-22180) Same query but different result occasionally

3 matches

Site Navigation

Mail list logo

Footer information