sunchao commented on pull request #30284: URL: https://github.com/apache/spark/pull/30284#issuecomment-724186158
> For example, we can place the the downloaded jars ahead to have a higher precedence in the classpath. @HyukjinKwon I'm not sure if this can solve the problem. The fundamental issue is we are trying to mix a single Hadoop version (i.e., the one used by Spark, for `hadoop-client` jar) with various others from the different Hive versions supported, so it is bound to to load some classes from the former while some others from the latter. To truly solve the issue and make `sharesHadoopClasses` stick to its meaning, IMO we'd have to pick a matching Hadoop version that is used by the specific Hive version. For instance, if the Hive version is 2.3.7, we should use Hadoop 2.7.2, and if Hive version is 2.2.0, we should pick Hadoop 2.6.0 instead. However I'm not sure if this is something we should do given that some of the Hadoop version used by Hive is really old, and also this is a rarely used feature like I mentioned in the PR description. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
