Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19208#discussion_r148926895
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
    @@ -117,6 +123,12 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
         instr.logParams(numFolds, seed, parallelism)
         logTuningParams(instr)
     
    +    val collectSubModelsParam = $(collectSubModels)
    +
    +    var subModels: Option[Array[Array[Model[_]]]] = if 
(collectSubModelsParam) {
    --- End diff --
    
    @holdenk Oh, sorry for confusing you. Yes, if set `collectSubModelsParam` 
the memory cost will always be `$(estimatorParamMaps).length * sizeof(model)`. 
According to your suggestion, we have to duplicate code logic (but if i am 
wrong correct me):
    - When set `collectSubModelsParam`, we cannot pipeline `modelFutures` and 
`foldMetricFutures`, we should execute `modelFutures` and collect results 
first, and modify `foldMetricFutures` logic (change it into the way passing 
`model` param, not by `modelFuture.map { model => ...} ).
    - When not set  `collectSubModelsParam`, just keep current `modelFutures` & 
`foldMetricFutures` and pipeline them to execute.
    So, according to your suggestion, it seems need more code. So do you still 
prefer this way ? Or do you have better way to implement that ?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to