Hi All,

Not sure if I need to ask this question on spark community or hive community.

We have a set of hive scripts that runs on EMR (Tez engine). We would like to 
experiment by moving some of it onto Spark. We are planning to experiment with 
two options.


  1.  Use the current code based on HQL, with engine set as spark.
  2.  Write pure spark code in scala/python using SparkQL and hive integration.

The first approach helps us to transition to Spark quickly but not sure if this 
is the best approach in terms of performance.  Could not find any reasonable 
comparisons of this two approaches.  It looks like writing pure Spark code, 
gives us more control to add logic and also control some of the performance 
features, for example things like caching/evicting etc.


Any advice on this is much appreciated.


Thanks,
-Manu

Reply via email to