Yes. We have not changed our script, and this only appears after we upgraded to 
new version at 24th.

 

 

Previously we’re using HIVE 1.2.0 + TEZ 0.7.0

 

 

Thanks!

 

Chi

 

From: Stephen Sprague [mailto:[email protected]] 
Sent: Tuesday, March 1, 2016 12:31 PM
To: [email protected]
Subject: Re: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join

 

very interesting.  so this did work correctly on your previous distribution of 
these two products?  May i ask what they were?

 

On Mon, Feb 29, 2016 at 8:24 PM, GAO Chi <[email protected] 
<mailto:[email protected]> > wrote:

Hi all,

 

We encountered a strange behavior after upgrading to HIVE 2.0.0 + TEZ 0.8.2.  

 

I simplified our query to this:

 

SELECT

  a.key,

  a.a_one,

  b.b_one,

  a.a_zero,

  b.b_zero

FROM

(

    SELECT

      11 key,

      0 confuse_you,

      1 a_one,

      0 a_zero

) a

LEFT JOIN 

(

    SELECT

      11 key,

      0 confuse_you,

      1 b_one,

      0 b_zero

) b 

ON a.key = b.key

;

 

 

Above query generates this unexpected result:

 

INFO  : Status: Running (Executing on YARN cluster with App id 
application_1456723490535_3653)

 

INFO  : Map 1: 0/1      Map 2: 0/1

INFO  : Map 1: 0/1      Map 2: 0(+1)/1

INFO  : Map 1: 0(+1)/1  Map 2: 0(+1)/1

INFO  : Map 1: 0(+1)/1  Map 2: 1/1

INFO  : Map 1: 1/1      Map 2: 1/1

INFO  : Completed executing 
command(queryId=hive_20160301115630_0a0dbee5-ba4b-45e7-b027-085f655640fd); Time 
taken: 10.225 seconds

INFO  : OK

+--------+----------+----------+-----------+-----------+--+

| a.key  | a.a_one  | b.b_one  | a.a_zero  | b.b_zero  |

+--------+----------+----------+-----------+-----------+--+

| 11     | 1        | 0        | 0         | 1         |

+--------+----------+----------+-----------+-----------+--+

 

If you change the constant value of subquery-b’s confuse_you column from 0 to 
2, the problem disappears. The plan returned from EXPLAIN shows the incorrect 
one is picking _col1 and _col2, while the correct one is picking _col2 and 
_col3 form sub query b.

 

Seems it cannot distinguish 2 columns with same constant value?

 

 

Anyone encountered similar problem?

 

 

Thanks!

 

Chi

 

 

Reply via email to