qingfa zhou created HIVE-23165:
----------------------------------

             Summary:  Hive On Spark left join and right join generated 
inconsistent data
                 Key: HIVE-23165
                 URL: https://issues.apache.org/jira/browse/HIVE-23165
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.2.0
         Environment: hive :2.3.0

spark:2.2.0

hadoop:2.7.3
            Reporter: qingfa zhou
            Assignee: Xuefu Zhang


*1)This is my sql.*

with delivery_day as (
 select * from (
 select dt,warehouse_code,b.sku_main_code,b.out_warehouse_code,b.is_pici_order
 from data_smartorder.dm_ordering_information_system_order_detail_parse t
 lateral view 
json_tuple(t.information_info,'warehouse_code','sku_main_code','调出仓','是否预付商品')b 
as warehouse_code,sku_main_code,out_warehouse_code,is_pici_order
 where dt=date_format(date_sub(current_date,1),'yyyyMMdd')
 and l1_category_name='策略配置'
 and l2_category_name='pb仓库补货仓品维度新'
 and b.is_pici_order='1'
 )t
),

avg_sale_7 as (
 select *,sku_sale_quantity+first_dilivery_quantity as avg_sale_7
 from (
 select t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name 
sku_main_name,
 sum(t1.warehouse_dispatch_quantity) as warehouse_dispatch_quantity,
 sum(t1.sku_sale_quantity) as sku_sale_quantity,
 sum(t1.first_dilivery_quantity) as first_dilivery_quantity
 from data_smartorder.dw_ordering_warehouse_sku_cargo_delivery_data_di t1
 where t1.dt=date_format(date_sub(current_date,1),'yyyyMMdd')
 group by t1.warehouse_code,t1.warehouse_name,t1.sku_main_code,t1.sku_name
 )t
)

 select t1.warehouse_code,t1.sku_main_code,t1.out_warehouse_code,
 t2.avg_sale_7
 from delivery_day t1
 left join avg_sale_7 t2
 on t1.warehouse_code=t2.warehouse_code
 and t1.sku_main_code=t2.sku_main_code
 where t1.sku_main_code='37010832'
 and t1.out_warehouse_code='1011';

left join and right join generated inconsistent data.

2) result in the left join 
7001  37010832  1011  26.8572
1011  37010832  1011  130.2858
2002  37010832  1011  40
1701  37010832  1011  NULL

3) result in the right join 
1011  37010832  1011  65.1429
2002  37010832  1011  20
7001  37010832  1011  13.4286

Inconsistent results in last column,'right join' 's result is right.But the 
results of hive on tez and sparksql are consistent and is true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to