[GitHub] spark pull request: Added support for :cp that was broken in...

gkossakowski Tue, 19 Aug 2014 13:33:29 -0700

Github user gkossakowski commented on the pull request:

    https://github.com/apache/spark/pull/1929#issuecomment-52694293
  
    Thanks @heathermiller for summing up my conversation.
    
    I realized that I introduced a bit of confusion with my 
[comment](https://issues.scala-lang.org/browse/SI-6502?focusedCommentId=70407&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-70407)
 on SI-6052. Let me clarify.
    
    The logic that is being removed by scala/scala#3884 does support the 
scenario of appending a jar to classpath. This is demonstrated by @rcsenkbeil's 
comment. However, that piece of logic promises much more. It promises to handle 
arbitrary changes to classpath, including removal and changes to single 
classes. However, it doesn't deliver on that promise. We discovered all sorts 
of subtle issues while working on resident compilation mode of scalac. This is 
not a real surprise: Scala compiler wasn't designed with resident compilation 
in mind.
    
    To sum it up: the `invalidateClassPathEntries` is half-baked implementation 
of a very ambitious goal. It comes with little documentation, zero tests and no 
clients using it. For that reason we decided to remove it.
    
    As I mentioned in SI-6052, we have a good chance to implement a small 
subset of what `invalidateClassPathEntries` does and that should be enough to 
satisfy Spark's needs. You guys don't need full-blown resident compilation 
logic just to load some jar.
    
    How do we determine what's the small subset? I said mentioned in SI-6502. 
The API that `:cp` command needs have to deal with two concerns:
    
      1. what should happen when appended jar contains package that overlap 
with what's already on classpath?
      2. what should happen when appended jar contains a class that shadows 
existing class?
    
    One might think, that the first point is not important because it's 
unlikely the same package will span several jars. That would be true if 
packages were flat. However, in Scala packages are *nested*. So if you have one 
jar with `org.apache.spark` package and another jar with `org.apache.commons` 
then you have overlapping packages. From Scala's point of view you have:
    
    ```
    # first jar
      org
        apache
          spark
    
    # second jar
      org
        apache
          commons
    ```
    
    When appending the second jar, you must merge org and apache packages. What 
you merge is contents of `PackageSymbol` which is a symbol that represents a 
single (nested) package from classpath. The `invalidateClassPathEntries` 
handles this scenario by clearing entire contents of symbol representing `org` 
and reloading a fresh entries from classpath and making sure that old symbol 
for `spark` package is not lost (so we have just one symbol representing 
`spark` package).
    
    The second point is extremely hard to implement properly. That's why I 
think the alternative api to current `invalidateClassPathEntries` should just 
abort in such case.
    
    To sum it up, here's what needs to happen for Scala 2.11.x:
    
      - define minimal classpath manipulation api that will be enough for Spark 
use case
      - write tests for scenarios outlined above
      - implement the minimal classpath manipulation 
(`invalidateClassPathEntries` can serve as inspiration)
      - make sure that the implementation abort on every case that is not 
supported; this way we'll avoid some weird compiler crashes caused by classpath 
and symbols getting out of sync
    
    I hope that helps a little bit.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Added support for :cp that was broken in...

Reply via email to