JoshRosen edited a comment on pull request #33989:
URL: https://github.com/apache/spark/pull/33989#issuecomment-928616164


   I think the test failures seen above are due to the tests not picking up the 
shaded classes: I wouldn't expect to see a reference to non-relocated Guava 
classnames if shaded Hive was being used by the tests.
   
   The SBT build doesn't perform any shading (because the `sbt-pom-reader` 
plugin only captures the project dependency structure, not build plugins). To 
date this hasn't been a problem since Spark's existing shading was only for the 
benefit of downstream consumers of Spark itself and wasn't necessary for 
Spark's internal functioning (and official releases are published with Maven, 
not SBT). With this new `hive-shaded`, though, we'll need to find a way to wire 
shaded classes into the SBT build.
   
   I think we'll also run into similar problems in the Maven build. According 
to [Maven's build lifecycle 
docs](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html):
   
   > `test` - test the compiled source code using a suitable unit testing 
framework. These tests should not require the code be packaged or deployed
   
   The shading runs during the `package` phase, which is after the test phase, 
so the tests won't be able to use the shaded artifacts.
   
   The `hive-shaded` module doesn't depend on any source code from Spark. Given 
this, a clean path forward might be to publish this Maven artifact separately 
from Spark's main build, then depend on the published artifact. This would be 
similar to the approach taken in https://github.com/apache/hbase-thirdparty and 
https://github.com/apache/hadoop-thirdparty . This would require a bunch of new 
infra setup, though, so if we're considering pursuing this then we should open 
a new thread on the dev@ mailing list to discuss.
   
   For now, you could test out that approach via manual local publishing: 
that'd unblock progress on testing the `IsolatedClientLoader` and other things.
   
   Why did the tests pass before? It looks like your most recent commit in 
c2ec1f8122ea9b871a1679571f0b20009be39cb7 included the `DependencyOverrides` 
version bump in SparkBuild.scala as well as the change to the 
`IsolatedClientLoader`. Prior to the `DependencyOverrides` change, the SBT 
build was using both unshaded Hive and old Guava, so I think the previous 
successful tests were passing accidentally and that the new failures aren't 
caused by the `IsolatedClientLoader` change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to