[GitHub] [spark] sunchao commented on pull request #30284: [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader

GitBox Mon, 09 Nov 2020 10:17:45 -0800


sunchao commented on pull request #30284:
URL: https://github.com/apache/spark/pull/30284#issuecomment-724186158



   > For example, we can place the the downloaded jars ahead to have a higher 
precedence in the classpath.
   
   @HyukjinKwon I'm not sure if this can solve the problem. The fundamental 
issue is we are trying to mix a single Hadoop version (i.e., the one used by 
Spark, for `hadoop-client` jar) with various others from the different Hive 
versions supported, so it is bound to to load some classes from the former 
while some others from the latter. 
   
   To truly solve the issue and make `sharesHadoopClasses` stick to its 
meaning, IMO we'd have to pick a matching Hadoop version that is used by the 
specific Hive version. For instance, if the Hive version is 2.3.7, we should 
use Hadoop 2.7.2, and if Hive version is 2.2.0, we should pick Hadoop 2.6.0 
instead. However I'm not sure if this is something we should do given that some 
of the Hadoop version used by Hive is really old, and also this is a rarely 
used feature like I mentioned in the PR description.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on pull request #30284: [SPARK-33376][SQL] Remove the option of "sharesHadoopClasses" in Hive IsolatedClientLoader

Reply via email to