Re: hive sql on tez run forever

Gopal Vijayaraghavan Mon, 11 May 2015 18:47:54 -0700

Hi,

You’re correct - that is not a valid rewrite.


Both tables have to be shuffled across due to the OR clause with no
reductions.

Cheers,
Gopal

On 5/11/15, 10:43 AM, "Eugene Koifman" <ekoif...@hortonworks.com> wrote:

>This isn’t a valid rewrite.
>if a(x,y) has 1 row (1,2) and b(x,z) has 1 row (1,1) then the 1st query
>will produce 1 row
>but the 2nd query with subselects will not.
>
>On 5/11/15, 10:13 AM, "Gopal Vijayaraghavan" <gop...@apache.org> wrote:
>
>>Hi,
>>
>>> I change the sql where condition to (where t.update_time >=
>>>'2015-05-04') , the sql can return result for a while. Because
>>>t.update_time
>>> >= '2015-05-04' can  filter many row when table scan. But why change
>>>where condition to
>>> (where t.update_time >= '2015-05-04' or length(t8.end_user_id)>0) ,the
>>>sql run forever as follows:
>>
>>
>>The OR clause is probably causing the problems.
>>
>>We¹re probably not pushing down the OR clauses down to the original table
>>scans.
>>
>>This is most likely a hive PPD miss where you do something like
>>
>>select a.*,b.* from a,b where a.x = b.x and (a.y = 1 or b.z = 1);
>>
>>where it doesn¹t get planned as
>>
>>select a1.*, b1.* from (select a.* from a where a.y=1) a1, (select b.*
>>from b where b.z = 1) b1 where a1.x = b1.x;
>>
>>instead gets planned as a full-scan JOIN, then a filter.
>>
>>Can you spend some time and try to rewrite down your case to something
>>like the above queries?
>>
>>If that works, then file a JIRA.
>>
>>Cheers,
>>Gopal
>>
>>
>

Re: hive sql on tez run forever

Reply via email to