Adam Szita created HIVE-21096:
---------------------------------

             Summary: Remove unnecessary Spark dependency from HS2 process
                 Key: HIVE-21096
                 URL: https://issues.apache.org/jira/browse/HIVE-21096
             Project: Hive
          Issue Type: Improvement
          Components: HiveServer2, Spark
            Reporter: Adam Szita
            Assignee: Adam Szita


When a HiveOnSpark job is kicked off most of the work is done by the 
RemoteDriver, which is a separate process. There a couple of smaller parts of 
code, where HS2 process depends on Spark jars, these for example include 
receiving stats from the driver or putting together a Spark conf object - used 
mostly during communication with RemoteDriver.

We can limit the data types used for such communication so that we don't use 
(and serialize) types that are in Spark codebase, and hence we can refactor our 
code to only use Spark jars in the Remote Driver process.

I think this way would be cleaner from dependencies point of view, and also 
less erroneous when users have to compile the classpath for their HS2 processes.
(E.g. due to a change between Spark 2.2 and 2.4 we had to also include 
spark-unsafe*.jar - though it's an internal change to Spark..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to