RE: Benchmarking Hive Changes

java8964 Wed, 05 Mar 2014 05:49:01 -0800

Are you doing on standalone one box? How large are your test files and how long 
of the jobs of each type took?
Yong


> From: [email protected]
> Subject: Benchmarking Hive Changes
> Date: Tue, 4 Mar 2014 21:31:42 -0500
> To: [email protected]
> 
> I’ve been trying to benchmark some of the Hive enhancements in Hadoop 2.0 
> using the HDP Sandbox. 
> 
> I took one of their example queries and executed it with the tables stored as 
> TEXTFILE, RCFILE, and ORC. I also tried enabling enabling vectorized 
> execution, and predicate pushdown.
> 
> SELECT s07.description, s07.salary, s08.salary,
>   s08.salary - s07.salary
> FROM
>   sample_07 s07 JOIN sample_08 s08
> ON ( s07.code = s08.code)
> WHERE
>  s07.salary < s08.salary
> SORT BY s08.salary-s07.salary DESC
> 
> Ultimately there was not much different performance in any of the executions, 
> can someone clarify for me if I need an actual full cluster to see 
> performance improvements, or if I’m missing something else. I thought at 
> minimum I would have seen an improvement moving to ORC from TEXTFILE.

RE: Benchmarking Hive Changes

Reply via email to