GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/18733

    [SPARK-21535][ML]Reduce memory requirement for CrossValidator and 
TrainValidationSplit

    ## What changes were proposed in this pull request?
    
    CrossValidator and TrainValidationSplit both use
    `models = est.fit(trainingDataset, epm) `
    to fit the models, where epm is `Array[ParamMap]`.
    Even though the training process is sequential, current implementation 
consumes extra driver memory for holding the trained models, which is not 
necessary and often leads to memory exception for both CrossValidator and 
TrainValidationSplit. My proposal is to optimize the training implementation, 
thus that used local model can be collected by GC, and avoid the unnecessary 
OOM exceptions.
    
    E.g. when grid search space is 12, old implementation needs to hold all 12 
trained models in the driver memory at the same time, while the new 
implementation only needs to hold 1 trained model at a time, and previous model 
can be cleared by GC
    
    ## How was this patch tested?
    
    Existing unit test since there's no change to logic.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark singleModel

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18733.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18733
    
----
commit a7667e72d78f679b9693e22742e8a624b6348fd2
Author: Yuhao Yang <[email protected]>
Date:   2017-07-25T21:41:17Z

    memory optimization

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to