GitHub user idigary opened a pull request:

    https://github.com/apache/spark/pull/11027

    [SPARK-13132] [MLlib] cache standardization param value in 
LogisticRegression

    cache the value of the standardization Param in LogisticRegression, rather 
than re-fetching it from the ParamMap for every index and every optimization 
step in the quasi-newton optimizer
    
    also, fix Param#toString to cache the stringified representation, rather 
than re-interpolating it on every call, so any other implementations that have 
similar repeated access patterns will see a benefit.
    
    this change improves training times for one of my test sets from ~7m30s to 
~4m30s

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/idibon/spark 
spark-13132-optimize-logistic-regression

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11027.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11027
    
----
commit 895facff8c0cae585726f28ed50a082daa5bfb81
Author: Gary King <[email protected]>
Date:   2016-02-02T05:27:58Z

    [spark-13132/ml] optimize parameter fetch in LogisticRegression
    
    improve LogisticRegression training times by ~35%-45% by caching
    the model standardization enable parameter within the regularization
    closure, rather than repeatedly referencing it from the set /
    default maps

commit 6790e359eabc0a9e6d7b395fb8b7723c232a71ae
Author: Gary King <[email protected]>
Date:   2016-02-02T16:15:10Z

    ml/params: cache stringified versions of Params
    
    repeated lookup of paramter values within ParamMaps was causing
    a significant (35-45%) performance hit within LogisticRegression
    (SPARK-13132) due to the string interpolation performed by every
    call to hashCode.
    
    cache the stringified representation of the Param in a private
    instance variable, so that the string interpolation only happens
    once

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to