hi, if you are already running hive with tez,the perf gain won't be obvious camparing with spark. I'd recommend experimenting with spark on something new until a better understanding is formed
Manu Jacob <manu.ja...@sas.com>于2020年10月6日 周二23:47写道: > Hi All, > > > > Not sure if I need to ask this question on hive community or spark > community. > > > > We have a set of hive scripts that runs on EMR (Tez engine). We would like > to experiment by moving some of it onto Spark. We are planning to > experiment with two options. > > > 1. Use the current code based on HQL, with engine set as spark. > 2. Write pure spark code in scala/python using SparkQL and hive > integration. > > > > The first approach helps us to transition to Spark quickly but not sure if > this is the best approach in terms of performance. Could not find any > reasonable comparisons of this two approaches. It looks like writing pure > Spark code, gives us more control to add logic and also control some of the > performance features, for example things like caching/evicting etc. > > > > > > Any advise on this is much appreciated. > > > > > > Thanks, > > -Manu >