[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

WeichenXu123 Wed, 13 Dec 2017 17:36:57 -0800

Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19904
  
    @MrBago 
    Your code https://github.com/apache/spark/pull/19904#discussion_r156751569 
also works fine, I think. Although it is more complicated.
    
    @BryanCutler 
    >the unpersist of training data is not async anymore, but this also changes 
the order in which fit and evaluate are called so that training data is not 
unpersisted until all but the last models are also evaluated. Before, all 
modelFutures would be executed first before metricFutures and so training data 
could be unpersisted as soon as possible.
    
    - My current PR code also unpersist the trainingdataset once all fitting 
finished (before evaluation). and, the calling `df.unpersist()` won't block 
(see df.unpersist(block = false) param). So move it into training thread won't 
cause some issue I think.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

Reply via email to