Hi,
I am using Hive 1.1.0 and Spark 1.5.1 and creating hive context in
spark-shell.

Now, I am experiencing reversed performance by Spark-Sql over Hive.
By default Hive gives result back in 27 seconds for plain select * query on
1 GB dataset containing 3623203 records, while spark-sql gives back in 2
mins on collect operation.

Cluster Config:
Hive : 6 Node : 16 GB Memory, 4 cores each
Spark : 4 Nodes : 16 GB Memory, 4 cores each

My dataset has 45 partitions and spark-sql creates 82 jobs.

I have tried all memory and garbage collection optimizations suggested on
official website but failed to get better performance and its worth to
mention that sometimes I get OOM error when I allocate executor memory less
than 10G.

Can somebody tell whats actually going on ?

Reply via email to