Hi guys
I am using CDH 5.3.3 and that comes with Hive 0.13.1 and Spark 1.2
So to answer your question its not Tez (that I believe comes with HortonWorks)
This Hive query was run with hive defaults.
I used additional hive params right now to improve the timingsSET 
mapreduce.job.reduces=16;SET mapreduce.tasktracker.map.tasks.maximum=24;SET 
mapreduce.tasktracker.reduce.tasks.maximum=16;SET 
mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;SET
 mapreduce.map.output.compress=true;

Now Time taken: 140.139 seconds, Fetched: 29597 row(s)(surprisingly close to 
spark-sql now LOL. Time to tweak spark-sql now) 
EARLIER RESULTS
Hive – 326.021 seconds, Fetched: 29597 row(s)
Impala – Fetched 27625 row(s) in 17.02s
spark-sql – Time taken: 120.236 seconds 

I don't have the bandwidth to manage individual components on the cluster :-) 
since I am solo doing all this and delivering ML solutions to production LOL.So 
I depend on distribution such as CDH. The downside is that one is always couple 
of versions behind.
Thanks for your questions.
regards
sanjay
      From: Michael Armbrust <mich...@databricks.com>
 To: user <user@spark.apache.org> 
 Sent: Thursday, June 18, 2015 3:25 PM
 Subject: Re: Spark-sql versus Impala versus Hive
   
I would also love to see a more recent version of Spark SQL.  There have been a 
lot of performance improvements between 1.2 and 1.4 :)


On Thu, Jun 18, 2015 at 3:18 PM, Steve Nunez <snu...@hortonworks.com> wrote:

Interesting. What where the Hive settings? Specifically it would be useful to 
know if this was Hive on Tez.
- Steve
From: Sanjay Subramanian
Reply-To: Sanjay Subramanian
Date: Thursday, June 18, 2015 at 11:08
To: "user@spark.apache.org"
Subject: Spark-sql versus Impala versus Hive


I just published results of my findings 
herehttps://bigdatalatte.wordpress.com/2015/06/18/spark-sql-versus-impala-versus-hive/







  

Reply via email to