Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

lalit sharma Wed, 08 Jun 2016 09:49:46 -0700

To add on what Vikash said above, bit more internals :
1. There are 2 components which work together to achieve Hive + Spark
integration
   a. HiveContext which extends SqlContext adds logic to add hive specific
things e.g. loading jars to talk to underlying metastore db, load configs
in hive-site.xml
   b. HiveThriftServer2 which uses native HiveServer2 and add logic for
creating sessions, handling operations.
2. Once thrift server is up , authentication , session management is all
delegated to Hive classes. Once parsing of query is done and logical plan
is created and passed on to create DataFrame.


So no mapReduce , spark intelligently uses needed pieces from Hive and use
its own execution engine.

--Regards,
Lalit

On Wed, Jun 8, 2016 at 9:59 PM, Vikash Pareek <vikash.par...@infoobjects.com
> wrote:

> Himanshu,
>
> Spark doesn't use hive execution engine (Map Reduce) to execute query.
> Spark
> only reads the meta data from hive meta store db and executes the query
> within Spark execution engine. This meta data is used by Spark's own SQL
> execution engine (this includes components such as catalyst, tungsten to
> optimize queries) to execute query and generate result faster than hive
> (Map
> Reduce).
>
> Using HiveContext means connecting to hive meta store db. Thus, HiveContext
> can access hive meta data, and hive meta data includes location of data,
> serialization and de-serializations, compression codecs, columns, datatypes
> etc. thus, Spark have enough information about the hive tables and it's
> data
> to understand the target data and execute the query over its on execution
> engine.
>
> Overall, Spark replaced the Map Reduce model completely by it's
> in-memory(RDD) computation engine.
>
> - Vikash Pareek
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/When-queried-through-hiveContext-does-hive-executes-these-queries-using-its-execution-engine-default-tp27114p27117.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: When queried through hiveContext, does hive executes these queries using its execution engine (default is map-reduce), or spark just reads the data and performs those queries itself?

Reply via email to