[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

vanzin Fri, 08 Aug 2014 10:40:33 -0700

Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1813#issuecomment-51633944
  
    Lots of comments, let me try to write a coherent response to all of them. 
:-)
    
    ### Shaded jars and maven central
    
    Shouldn't be a problem. You're publishing your project's artifact to maven 
central, which just happens to include the shaded classes. This is similar to 
how assembly jars are published (http://search.maven.org/#browse|-1363484021).
    
    ### The Optional class
    
    It should be possible to use Guava's original Optional, without having to 
fork it. The only reason I forked it is to allow changing the version of Guava 
without affecting the Spark API. But that can be done on-demand if needed. I'd 
still package it in the spark-core jar, otherwise the guava dependency would 
need to have "compile" scope (see below).
    
    ### Compile vs. Provided
    
    My main argument for using "provided" is to avoid leaking guava into the 
user's *compilation classpath*. Users depend on spark-core (e.g.), and if 
spark-core has a compile dependency on guava, guava will be available in the 
user's compilation classpath (regadless of whether spark-core is set up as 
compile or provided). If that dependency is provided (and thus not transitive), 
then it never shows up; if the user needs guava, he needs to explicitly depend 
on it.
    
    This does have effects on packaging, though: for people using an existing 
spark assembly to run their apps, nothing changes. But if the user is creating 
his own uber-jar with spark and everything else in it, and running it in some 
custom environment, he'll need to explicitly package guava (even if his app 
doesn't use it). My belief is that the former case is the overwhelming 
majority, and the latter case means the user has to care about too many things 
already (e.g. choosing compatible versions of everything between his app and 
spark and hadoop et al) for this little thing to be a real concern.
    
    ### Shading at compile time
    
    @srowen let me know if I misunderstood, but do you mean creating a 
spark-core jar with the shaded guava classes in it? I actually started with 
that alternative, but it seems messy. Either you have all downstream projects 
reference the shaded classes (something that would be needed if going with the 
separate shaded jar approach), or you get into a weird place, where spark-core 
is referencing the shaded classes but all downstream projects are not. The 
assembly would fix everything (by shading again), but someone not using the 
assembly could get really weird errors.
    
    Also, having duplicate class names in the classpath really confuses some 
IDEs that automatically add imports. More manual configuration for people to 
add...
    
    ### sbt shading plugin
    
    I think the code I wrote can be added to the sbt-assembly plugin (with some 
modifications). It probably is the best place for that, anyway, since I created 
it as a merge strategy for that plugin. But that's for later.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2848] Shade Guava in uber-jars.

Reply via email to