GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/3658
[SPARK-4809] Rework Guava library shading.
The current way of shading Guava is a little problematic. Code that
depends on "spark-core" does not see the transitive dependency, yet
classes in "spark-core" actually depend on Guava. So it's a little
tricky to run unit tests that use spark-core classes, since you need
a compatible version of Guava in your dependencies when running the
tests. This can become a little tricky, and is kind of a bad user
experience.
This change modifies the way Guava is shaded so that it's applied
uniformly across the Spark build. This means Guava is shaded inside
spark-core itself, so that the dependency issues above are solved.
Aside from that, all Spark sub-modules have their Guava references
relocated, so that they refer to the relocated classes now packaged
inside spark-core. Before, this was only done by the time the assembly
was built, so project that did not end up inside the assembly (such
as streaming backends) could still reference the original location
of Guava classes).
This relocation does not apply to the sub-modules under network/,
though. For those cases, we want to keep the Guava dependency alive,
since we want to use the same Guava as the rest of the Yarn NM
when deploying the auxiliary shuffle service. For this reason, also,
the network/ dependencies are shaded into the spark-core artifact
too, so that the raw Guava dependency doesn't leak.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-4809
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3658.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3658
----
commit 4a4ed4202eac66bc288c8fcb2107b0608cc1e32f
Author: Marcelo Vanzin <[email protected]>
Date: 2014-11-21T20:25:15Z
[SPARK-4809] Rework Guava library shading.
The current way of shading Guava is a little problematic. Code that
depends on "spark-core" does not see the transitive dependency, yet
classes in "spark-core" actually depend on Guava. So it's a little
tricky to run unit tests that use spark-core classes, since you need
a compatible version of Guava in your dependencies when running the
tests. This can become a little tricky, and is kind of a bad user
experience.
This change modifies the way Guava is shaded so that it's applied
uniformly across the Spark build. This means Guava is shaded inside
spark-core itself, so that the dependency issues above are solved.
Aside from that, all Spark sub-modules have their Guava references
relocated, so that they refer to the relocated classes now packaged
inside spark-core. Before, this was only done by the time the assembly
was built, so project that did not end up inside the assembly (such
as streaming backends) could still reference the original location
of Guava classes).
This relocation does not apply to the sub-modules under network/,
though. For those cases, we want to keep the Guava dependency alive,
since we want to use the same Guava as the rest of the Yarn NM
when deploying the auxiliary shuffle service. For this reason, also,
the network/ dependencies are shaded into the spark-core artifact
too, so that the raw Guava dependency doesn't leak.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]