GitHub user rcsenkbeil opened a pull request:
https://github.com/apache/spark/pull/4034
[REPL][SPARK-4923] Add Developer API to REPL to allow re-publishing the
REPL jar
As requested in
[SPARK-4923](https://issues.apache.org/jira/browse/SPARK-4923), I've provided a
rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it
does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has
the old `scala.tools.nsc` package and the SparkIMain does not appear to have
the class server needed for shipping code over (unless this functionality has
been moved elsewhere?). I also left alone the `ExecutorClassLoader` and
`ConstructorCleaner` as I have no experience working with those classes.
This marks the majority of methods in `SparkIMain` as _private_ with a few
special cases being _private[repl]_ as other classes within the same package
access them. Any public method has been marked with `@DeveloperApi` as
suggested by @pwendell and I took the liberty of writing up a Scaladoc for each
one to further elaborate their usage.
As the Scala 2.11 REPL
[conforms]((https://github.com/scala/scala/pull/2206)) to
[JSR-223](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/),
the [Spark Kernel](https://github.com/ibm-et/spark-kernel) uses the SparkIMain
of Scala 2.10 in the same manner. So, I've taken care to expose methods
predominately related to necessary functionality towards a JSR-223 scripting
engine implementation.
1. The ability to _get_ variables from the interpreter (and other
information like class/symbol/type)
2. The ability to _put_ variables into the interpreter
3. The ability to _compile_ code
4. The ability to _execute_ code
5. The ability to get contextual information regarding the scripting
environment
Additional functionality that I marked as exposed included the following:
1. The blocking initialization method (needed to actually start SparkIMain
instance)
2. The class server uri (needed to set the _spark.repl.class.uri_ property
after initialization), reduced from the entire class server
3. The class output directory (beneficial for tools like ours that need to
inspect and use the directory where class files are served)
4. Suppression (quiet/silence) mechanics for output
5. Ability to add a jar to the compile/runtime classpath
6. The reset/close functionality
7. Metric information (last variable assignment, "needed" for extracting
results from last execution, real variable name for better debugging)
8. Execution wrapper (useful to have, but debatable)
Aside from `SparkIMain`, I updated other classes/traits and their methods
in the _repl_ package to be private/package protected where possible. A few odd
cases (like the SparkHelper being in the scala.tools.nsc package to expose a
private variable) still exist, but I did my best at labelling them.
`SparkCommandLine` has proven useful to extract settings and
`SparkJLineCompletion` has proven to be useful in implementing auto-completion
in the [Spark Kernel](https://github.com/ibm-et/spark-kernel) project. Other
than those - and `SparkIMain` - my experience has yielded that other
classes/methods are not necessary for interactive applications taking advantage
of the REPL API.
Tested via the following:
$ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M
-XX:ReservedCodeCacheSize=512m"
$ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test
Also did a quick verification that I could start the shell and execute some
code:
$ ./bin/spark-shell
...
scala> val x = 3
x: Int = 3
scala> sc.parallelize(1 to 10).reduce(_+_)
...
res1: Int = 55
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rcsenkbeil/spark AddDeveloperApiToRepl
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4034.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4034
----
commit 925c1127862c48a4ac15d498b50336c8010a2587
Author: Chip Senkbeil <[email protected]>
Date: 2015-01-14T03:56:04Z
Added DeveloperApi and Scaladocs to SparkIMain for Scala 2.10
commit 26fd2861637565af6ba717a9e9a3097a07d57f27
Author: Chip Senkbeil <[email protected]>
Date: 2015-01-14T06:07:33Z
Refactored other Scala 2.10 classes and methods to be private/package
protected where possible
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]