Github user vanzin commented on the pull request:
https://github.com/apache/spark/pull/1813#issuecomment-51633944
Lots of comments, let me try to write a coherent response to all of them.
:-)
### Shaded jars and maven central
Shouldn't be a problem. You're publishing your project's artifact to maven
central, which just happens to include the shaded classes. This is similar to
how assembly jars are published (http://search.maven.org/#browse|-1363484021).
### The Optional class
It should be possible to use Guava's original Optional, without having to
fork it. The only reason I forked it is to allow changing the version of Guava
without affecting the Spark API. But that can be done on-demand if needed. I'd
still package it in the spark-core jar, otherwise the guava dependency would
need to have "compile" scope (see below).
### Compile vs. Provided
My main argument for using "provided" is to avoid leaking guava into the
user's *compilation classpath*. Users depend on spark-core (e.g.), and if
spark-core has a compile dependency on guava, guava will be available in the
user's compilation classpath (regadless of whether spark-core is set up as
compile or provided). If that dependency is provided (and thus not transitive),
then it never shows up; if the user needs guava, he needs to explicitly depend
on it.
This does have effects on packaging, though: for people using an existing
spark assembly to run their apps, nothing changes. But if the user is creating
his own uber-jar with spark and everything else in it, and running it in some
custom environment, he'll need to explicitly package guava (even if his app
doesn't use it). My belief is that the former case is the overwhelming
majority, and the latter case means the user has to care about too many things
already (e.g. choosing compatible versions of everything between his app and
spark and hadoop et al) for this little thing to be a real concern.
### Shading at compile time
@srowen let me know if I misunderstood, but do you mean creating a
spark-core jar with the shaded guava classes in it? I actually started with
that alternative, but it seems messy. Either you have all downstream projects
reference the shaded classes (something that would be needed if going with the
separate shaded jar approach), or you get into a weird place, where spark-core
is referencing the shaded classes but all downstream projects are not. The
assembly would fix everything (by shading again), but someone not using the
assembly could get really weird errors.
Also, having duplicate class names in the classpath really confuses some
IDEs that automatically add imports. More manual configuration for people to
add...
### sbt shading plugin
I think the code I wrote can be added to the sbt-assembly plugin (with some
modifications). It probably is the best place for that, anyway, since I created
it as a merge strategy for that plugin. But that's for later.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]