GitHub user ethanluoyc opened a pull request: https://github.com/apache/spark/pull/15279
SPARK-12347 [ML] Add a script to test Spark ML examples. This PR addresses [SPARK-12347](https://issues.apache.org/jira/browse/SPARK-12347?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20ML%20ORDER%20BY%20priority%20DESC) and may also be helpful for [SPARK-15571](https://issues.apache.org/jira/browse/SPARK-15571). ## What changes were proposed in this pull request? This PR adds a python script to drive all the examples located in the `examples` subdirectory. It should be able to streamline the testing of the examples in R, Python, Scala and Java to see if any of them has incompatible behavior with the codebase. ## How was this patch tested? This PR is not yet fully ready for merging. I would like to have some reviews for how it works best for those who will indeed be using this script. For now, it introduces the following features: - [x] select examples of a specific language to run (R, Python, Scala or Java) - [x] running in parallem (like that in run-test.py) - [ ] Configuring according to environment variables. - [ ] **Run those examples requiring arguments to be passed in.** Note that the last TODO is really important, for that I would like to hear suggestions from the reviewers for how it should should be best implemented. For now, I think one good way will be to have comments as directives in the example code. Like how ` # $example on$` are introduced to facilitate doc generation. We can do something similar to hint what arguments should be passed in for testing. Otherwise, we can always fall back to the way we discussed in the JIRA [SPARK-12347](https://issues.apache.org/jira/browse/SPARK-12347) Also, some of the functionality replicates that in run-tests.py. Perhaps we can find a way to integrate both? You can merge this pull request into a Git repository by running: $ git pull https://github.com/ethanluoyc/spark test-examples Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15279.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15279 ---- commit c566a5bfe72aa9be10d9b3f90ea18ec0d0382f93 Author: ethanluoyc <ethanlu...@gmail.com> Date: 2016-09-28T12:35:38Z Add a script to test Spark examples. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org