Github user WeichenXu123 commented on the issue:
https://github.com/apache/spark/pull/19904
@MrBago
Your code https://github.com/apache/spark/pull/19904#discussion_r156751569
also works fine, I think. Although it is more complicated.
@BryanCutler
>the unpersist of training data is not async anymore, but this also changes
the order in which fit and evaluate are called so that training data is not
unpersisted until all but the last models are also evaluated. Before, all
modelFutures would be executed first before metricFutures and so training data
could be unpersisted as soon as possible.
- My current PR code also unpersist the trainingdataset once all fitting
finished (before evaluation). and, the calling `df.unpersist()` won't block
(see df.unpersist(block = false) param). So move it into training thread won't
cause some issue I think.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]