Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19904
@sethah To verify the memory issue, you can add one line test code against
current master at here:
```
val modelFutures = ...
// Unpersist training data only when all models have trained
Future.sequence[Model[_], Iterable](modelFutures)(implicitly,
executionContext)
.onComplete { _ => trainingDataset.unpersist() } (executionContext)
// Evaluate models in a Future that will calulate a metric and allow
model to be cleaned up
val foldMetricFutures = ....
// Wait for metrics to be calculated before unpersisting validation
dataset
val foldMetrics = foldMetricFutures.map(ThreadUtils.awaitResult(_,
Duration.Inf))
validationDataset.unpersist()
//add test code here, fetch all models
val models = modelFutures.map(_.value.get.get)
foldMetrics
```
The test code I add here is **val models =
modelFutures.map(_.value.get.get)** So it can prove that these models are still
in memory, we can get them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]