Are you doing on standalone one box? How large are your test files and how long of the jobs of each type took? Yong
> From: [email protected] > Subject: Benchmarking Hive Changes > Date: Tue, 4 Mar 2014 21:31:42 -0500 > To: [email protected] > > I’ve been trying to benchmark some of the Hive enhancements in Hadoop 2.0 > using the HDP Sandbox. > > I took one of their example queries and executed it with the tables stored as > TEXTFILE, RCFILE, and ORC. I also tried enabling enabling vectorized > execution, and predicate pushdown. > > SELECT s07.description, s07.salary, s08.salary, > s08.salary - s07.salary > FROM > sample_07 s07 JOIN sample_08 s08 > ON ( s07.code = s08.code) > WHERE > s07.salary < s08.salary > SORT BY s08.salary-s07.salary DESC > > Ultimately there was not much different performance in any of the executions, > can someone clarify for me if I need an actual full cluster to see > performance improvements, or if I’m missing something else. I thought at > minimum I would have seen an improvement moving to ORC from TEXTFILE.
