Re: Optimize Hive Query

Gopal Vijayaraghavan Fri, 24 Jun 2016 15:16:27 -0700


> Please help me on this....let me know you need other info.


Are the ORC tables fully compacted? Looks like you're running a version of
Hive-ACID, which does not perform well without compacting delta files.

dfs -ls <path>;

should tell you whether there are any delta_* files in the list.

> |                            ACID table:true
>                                           |
> |                            alias:tuning_dd_key
>                        |

Other than that, it does look like you have only 1 shuffle and if you did
run this using Tez, I recommend using

<https://github.com/apache/tez/tree/master/tez-tools/swimlanes>

to find out the slowest task & find more information about it. You will
get a diagram which looks like this

<http://www.slideshare.net/t3rmin4t0r/tez8-ui-walkthrough/20>


The longest bar of that is the slowest task. I have another version of it,
which is unreleased yet (is a bit hard to explain) giving an image which
looks like 

<http://people.apache.org/~gopalv/q21_suppliers_who_kept_orders_waiting.svg
>


<https://github.com/t3rmin4t0r/tez-swimlanes/blob/master/vertex.py>

which is better at finding the 1-2 reducers skewing in a large DAG like
q21 [1].


Cheers,
Gopal
[1] - 
<http://people.apache.org/~gopalv/tpch-plans/q21_suppliers_who_kept_orders_
waiting.svg>

Re: Optimize Hive Query

Reply via email to