[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

WeichenXu123 Thu, 07 Dec 2017 23:22:24 -0800

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19904
  
    @sethah To verify the memory issue, you can add one line test code against 
current master at here:
    
    ```
          val modelFutures = ...
          // Unpersist training data only when all models have trained
          Future.sequence[Model[_], Iterable](modelFutures)(implicitly, 
executionContext)
            .onComplete { _ => trainingDataset.unpersist() } (executionContext)
          // Evaluate models in a Future that will calulate a metric and allow 
model to be cleaned up
          val foldMetricFutures = ....
          // Wait for metrics to be calculated before unpersisting validation 
dataset
          val foldMetrics = foldMetricFutures.map(ThreadUtils.awaitResult(_, 
Duration.Inf))
          validationDataset.unpersist()
    
          //add test code here, fetch all models
          val models = modelFutures.map(_.value.get.get)
    
          foldMetrics
    ```
    The test code I add here is **val models = 
modelFutures.map(_.value.get.get)** So it can prove that these models are still 
in memory, we can get them.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

Reply via email to