Hi Mayur, I cannot use spark sql in this case because many of the aggregations are not supported yet. Hence I migrated back to use Shark as all those aggregation functions are supported.
apache-spark-user-list.1001560.n3.nabble.com/Support-for-Percentile-and-Variance-Aggregation-functions-in-Spark-with-HiveContext-td10658.html <http://apache-spark-user-list.1001560.n3.nabble.com/Support-for-Percentile-and-Variance-Aggregation-functions-in-Spark-with-HiveContext-td10658.html> Forgot to mention in the earlier thread, that the raw_table which I am using is actually a parquet table. >> 2. cache data at a partition level from Hive & operate on those instead. Do you mean that I need to cache the table created by querying data for set of few months and then issue the adhoc query on that table.?? Thanks and regards Vinay Kashyap -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Low-Performance-of-Shark-over-Spark-tp11649p11776.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org