Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16774#discussion_r136868226
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -100,31 +113,53 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0")
override val uid: String)
val eval = $(evaluator)
val epm = $(estimatorParamMaps)
val numModels = epm.length
- val metrics = new Array[Double](epm.length)
+
+ // Create execution context based on $(parallelism)
+ val executionContext = getExecutionContext
--- End diff --
In the corresponding PR for PySpark implementation the number of threads is
limited by the number of models to be trained
(https://github.com/WeichenXu123/spark/blob/be2f3d0ec50db4730c9e3f9a813a4eb96889f5b6/python/pyspark/ml/tuning.py#L261).
We might do that for instance by overriding the `getParallelism` method. What
do you think about this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]