[GitHub] spark pull request: [REPL][SPARK-4923] Add Developer API to REPL t...

rcsenkbeil Tue, 13 Jan 2015 22:33:07 -0800

GitHub user rcsenkbeil opened a pull request:

    https://github.com/apache/spark/pull/4034


    [REPL][SPARK-4923] Add Developer API to REPL to allow re-publishing the 
REPL jar

    As requested in 
[SPARK-4923](https://issues.apache.org/jira/browse/SPARK-4923), I've provided a 
rough DeveloperApi for the repl. I've only done this for Scala 2.10 because it 
does not appear that Scala 2.11 is implemented. The Scala 2.11 repl still has 
the old `scala.tools.nsc` package and the SparkIMain does not appear to have 
the class server needed for shipping code over (unless this functionality has 
been moved elsewhere?). I also left alone the `ExecutorClassLoader` and 
`ConstructorCleaner` as I have no experience working with those classes.
    
    This marks the majority of methods in `SparkIMain` as _private_ with a few 
special cases being _private[repl]_ as other classes within the same package 
access them. Any public method has been marked with `@DeveloperApi` as 
suggested by @pwendell and I took the liberty of writing up a Scaladoc for each 
one to further elaborate their usage.
    
    As the Scala 2.11 REPL 
[conforms]((https://github.com/scala/scala/pull/2206)) to 
[JSR-223](http://docs.oracle.com/javase/8/docs/technotes/guides/scripting/), 
the [Spark Kernel](https://github.com/ibm-et/spark-kernel) uses the SparkIMain 
of Scala 2.10 in the same manner. So, I've taken care to expose methods 
predominately related to necessary functionality towards a JSR-223 scripting 
engine implementation.
    
    1. The ability to _get_ variables from the interpreter (and other 
information like class/symbol/type)
    2. The ability to _put_ variables into the interpreter
    3. The ability to _compile_ code
    4. The ability to _execute_ code
    5. The ability to get contextual information regarding the scripting 
environment
    
    Additional functionality that I marked as exposed included the following:
    
    1. The blocking initialization method (needed to actually start SparkIMain 
instance)
    2. The class server uri (needed to set the _spark.repl.class.uri_ property 
after initialization), reduced from the entire class server
    3. The class output directory (beneficial for tools like ours that need to 
inspect and use the directory where class files are served)
    4. Suppression (quiet/silence) mechanics for output
    5. Ability to add a jar to the compile/runtime classpath
    6. The reset/close functionality
    7. Metric information (last variable assignment, "needed" for extracting 
results from last execution, real variable name for better debugging)
    8. Execution wrapper (useful to have, but debatable)
    
    Aside from `SparkIMain`, I updated other classes/traits and their methods 
in the _repl_ package to be private/package protected where possible. A few odd 
cases (like the SparkHelper being in the scala.tools.nsc package to expose a 
private variable) still exist, but I did my best at labelling them.
    
    `SparkCommandLine` has proven useful to extract settings and 
`SparkJLineCompletion` has proven to be useful in implementing auto-completion 
in the [Spark Kernel](https://github.com/ibm-et/spark-kernel) project. Other 
than those - and `SparkIMain` - my experience has yielded that other 
classes/methods are not necessary for interactive applications taking advantage 
of the REPL API.
    
    Tested via the following:
    
        $ export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M 
-XX:ReservedCodeCacheSize=512m"
        $ mvn -Phadoop-2.3 -DskipTests clean package && mvn -Phadoop-2.3 test
    
    Also did a quick verification that I could start the shell and execute some 
code:
    
        $ ./bin/spark-shell
        ...
    
        scala> val x = 3
        x: Int = 3
    
        scala> sc.parallelize(1 to 10).reduce(_+_)
        ...
        res1: Int = 55

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rcsenkbeil/spark AddDeveloperApiToRepl

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4034.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4034
    
----
commit 925c1127862c48a4ac15d498b50336c8010a2587
Author: Chip Senkbeil <[email protected]>
Date:   2015-01-14T03:56:04Z

    Added DeveloperApi and Scaladocs to SparkIMain for Scala 2.10

commit 26fd2861637565af6ba717a9e9a3097a07d57f27
Author: Chip Senkbeil <[email protected]>
Date:   2015-01-14T06:07:33Z

    Refactored other Scala 2.10 classes and methods to be private/package 
protected where possible

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [REPL][SPARK-4923] Add Developer API to REPL t...

Reply via email to