Hi Manu,

In the past (July 2016), I made a presentation organised by then
Hortonworks in London titled "Query Engines for Hive: MR, Spark, Tez with
LLAP – Considerations! "

The PDF presentation is here
<https://talebzadehmich.files.wordpress.com/2016/08/hive_on_spark_only.pdf>.
With a caveat that was more than 4 years ago!

However, as of today I would recommend writing the code in Spark with Scala
and running against Spark. You can try it using spark-shell to start with.

If you are reading from Hive table or any other source like CSV etc, there
are plenty of examples in Spark web https://spark.apache.org/examples.html

Also I suggest that you use Scala as Spark itself is written in Scala
(though Python is more popular with Data Science guys).

HTH



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 6 Oct 2020 at 16:47, Manu Jacob <manu.ja...@sas.com> wrote:

> Hi All,
>
>
>
> Not sure if I need to ask this question on hive community or spark
> community.
>
>
>
> We have a set of hive scripts that runs on EMR (Tez engine). We would like
> to experiment by moving some of it onto Spark. We are planning to
> experiment with two options.
>
>
>    1. Use the current code based on HQL, with engine set as spark.
>    2. Write pure spark code in scala/python using SparkQL and hive
>    integration.
>
>
>
> The first approach helps us to transition to Spark quickly but not sure if
> this is the best approach in terms of performance.  Could not find any
> reasonable comparisons of this two approaches.  It looks like writing pure
> Spark code, gives us more control to add logic and also control some of the
> performance features, for example things like caching/evicting etc.
>
>
>
>
>
> Any advise on this is much appreciated.
>
>
>
>
>
> Thanks,
>
> -Manu
>

Reply via email to