GitHub user ethanluoyc opened a pull request:

    https://github.com/apache/spark/pull/15279

    SPARK-12347 [ML] Add a script to test Spark ML examples.

    This PR addresses 
[SPARK-12347](https://issues.apache.org/jira/browse/SPARK-12347?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20ML%20ORDER%20BY%20priority%20DESC)
 and may also be helpful for 
[SPARK-15571](https://issues.apache.org/jira/browse/SPARK-15571).
    
    ## What changes were proposed in this pull request?
    
    This PR adds a python script to drive all the examples located in the 
`examples` subdirectory. It should be able to streamline the testing of the 
examples in R, Python, Scala and Java to see if any of them has incompatible 
behavior with the codebase.
    
    ## How was this patch tested?
    
    This PR is not yet fully ready for merging. I would like to have some 
reviews for how it works best for those who will indeed be using this script. 
For now, it introduces the following features:
    
    - [x] select examples of a specific language to run (R, Python, Scala or 
Java)
    - [x] running in parallem (like that in run-test.py)
    - [ ] Configuring according to environment variables.
    - [ ] **Run those examples requiring arguments to be passed in.**
    
    Note that the last TODO is really important, for that I would like to hear 
suggestions from the reviewers for how it should should be best implemented. 
For now, I think one good way will be to have comments as directives in the 
example code. Like how `    # $example on$` are introduced to facilitate doc 
generation. We can do something similar to hint what arguments should be passed 
in for testing. Otherwise, we can always fall back to the way we discussed in 
the JIRA [SPARK-12347](https://issues.apache.org/jira/browse/SPARK-12347)
    
    Also, some of the functionality replicates that in run-tests.py. Perhaps we 
can find a way to integrate both?
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ethanluoyc/spark test-examples

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15279
    
----
commit c566a5bfe72aa9be10d9b3f90ea18ec0d0382f93
Author: ethanluoyc <ethanlu...@gmail.com>
Date:   2016-09-28T12:35:38Z

    Add a script to test Spark examples.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to