Github user gkossakowski commented on the pull request:
https://github.com/apache/spark/pull/1929#issuecomment-52694293
Thanks @heathermiller for summing up my conversation.
I realized that I introduced a bit of confusion with my
[comment](https://issues.scala-lang.org/browse/SI-6502?focusedCommentId=70407&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-70407)
on SI-6052. Let me clarify.
The logic that is being removed by scala/scala#3884 does support the
scenario of appending a jar to classpath. This is demonstrated by @rcsenkbeil's
comment. However, that piece of logic promises much more. It promises to handle
arbitrary changes to classpath, including removal and changes to single
classes. However, it doesn't deliver on that promise. We discovered all sorts
of subtle issues while working on resident compilation mode of scalac. This is
not a real surprise: Scala compiler wasn't designed with resident compilation
in mind.
To sum it up: the `invalidateClassPathEntries` is half-baked implementation
of a very ambitious goal. It comes with little documentation, zero tests and no
clients using it. For that reason we decided to remove it.
As I mentioned in SI-6052, we have a good chance to implement a small
subset of what `invalidateClassPathEntries` does and that should be enough to
satisfy Spark's needs. You guys don't need full-blown resident compilation
logic just to load some jar.
How do we determine what's the small subset? I said mentioned in SI-6502.
The API that `:cp` command needs have to deal with two concerns:
1. what should happen when appended jar contains package that overlap
with what's already on classpath?
2. what should happen when appended jar contains a class that shadows
existing class?
One might think, that the first point is not important because it's
unlikely the same package will span several jars. That would be true if
packages were flat. However, in Scala packages are *nested*. So if you have one
jar with `org.apache.spark` package and another jar with `org.apache.commons`
then you have overlapping packages. From Scala's point of view you have:
```
# first jar
org
apache
spark
# second jar
org
apache
commons
```
When appending the second jar, you must merge org and apache packages. What
you merge is contents of `PackageSymbol` which is a symbol that represents a
single (nested) package from classpath. The `invalidateClassPathEntries`
handles this scenario by clearing entire contents of symbol representing `org`
and reloading a fresh entries from classpath and making sure that old symbol
for `spark` package is not lost (so we have just one symbol representing
`spark` package).
The second point is extremely hard to implement properly. That's why I
think the alternative api to current `invalidateClassPathEntries` should just
abort in such case.
To sum it up, here's what needs to happen for Scala 2.11.x:
- define minimal classpath manipulation api that will be enough for Spark
use case
- write tests for scenarios outlined above
- implement the minimal classpath manipulation
(`invalidateClassPathEntries` can serve as inspiration)
- make sure that the implementation abort on every case that is not
supported; this way we'll avoid some weird compiler crashes caused by classpath
and symbols getting out of sync
I hope that helps a little bit.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]