Hi Mayur,

I cannot use spark sql in this case because many of the aggregations are not
supported yet. Hence I migrated back to use Shark as all those aggregation
functions are supported.

apache-spark-user-list.1001560.n3.nabble.com/Support-for-Percentile-and-Variance-Aggregation-functions-in-Spark-with-HiveContext-td10658.html
<http://apache-spark-user-list.1001560.n3.nabble.com/Support-for-Percentile-and-Variance-Aggregation-functions-in-Spark-with-HiveContext-td10658.html>
  

Forgot to mention in the earlier thread, that the raw_table which I am using
is actually a parquet table.

>> 2. cache data at a partition level from Hive & operate on those instead.

Do you mean that I need to cache the table created by querying data for set
of few months and then issue the adhoc query on that table.?? 



Thanks and regards
Vinay Kashyap



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Performance-of-Shark-over-Spark-tp11649p11776.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to