GitHub user idigary opened a pull request:
https://github.com/apache/spark/pull/11027
[SPARK-13132] [MLlib] cache standardization param value in
LogisticRegression
cache the value of the standardization Param in LogisticRegression, rather
than re-fetching it from the ParamMap for every index and every optimization
step in the quasi-newton optimizer
also, fix Param#toString to cache the stringified representation, rather
than re-interpolating it on every call, so any other implementations that have
similar repeated access patterns will see a benefit.
this change improves training times for one of my test sets from ~7m30s to
~4m30s
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/idibon/spark
spark-13132-optimize-logistic-regression
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11027.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11027
----
commit 895facff8c0cae585726f28ed50a082daa5bfb81
Author: Gary King <[email protected]>
Date: 2016-02-02T05:27:58Z
[spark-13132/ml] optimize parameter fetch in LogisticRegression
improve LogisticRegression training times by ~35%-45% by caching
the model standardization enable parameter within the regularization
closure, rather than repeatedly referencing it from the set /
default maps
commit 6790e359eabc0a9e6d7b395fb8b7723c232a71ae
Author: Gary King <[email protected]>
Date: 2016-02-02T16:15:10Z
ml/params: cache stringified versions of Params
repeated lookup of paramter values within ParamMaps was causing
a significant (35-45%) performance hit within LogisticRegression
(SPARK-13132) due to the string interpolation performed by every
call to hashCode.
cache the stringified representation of the Param in a private
instance variable, so that the string interpolation only happens
once
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]