GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/19857
[SPARK-22667][ML] Fix model-specific optimization support for ML tuning: Python API ## What changes were proposed in this pull request? Python CrossValidator/TrainValidationSplit: With base Estimator implemented in Scala/Java â Convert base Estimator to Scala/Java object, and call the JVM fit() (as in Weichenâs comment) With base Estimator implemented in Python â Python needs the same machinery for multi-model fitting and parallelism as Scala. We can call directly into it. New API added: ``` class Estimator: def parallelFit(self, dataset, paramMaps, threadPool, modelCallback): ``` **Note** This PR also fix the `# TODO: persist average metrics as well` in CV/TVS. Because the testsuite need to check consistency of `avgMetrics` so this need to be fixed. If this need backport to old spark version, I can split it to a separate PR. ## How was this patch tested? Existing UT already covers each code paths which need test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark fix_model_spec_optim_py Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19857.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19857 ---- commit 980c8ec87ddbc9f938942e78bb4cfe9753722bd2 Author: WeichenXu <weichen...@databricks.com> Date: 2017-11-30T10:08:55Z init pr ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org