GitHub user falaki opened a pull request:
https://github.com/apache/spark/pull/1560
[WIP] [MLLIB] Ordinary least square implementation
__This is work in progress__
Along with other statistical summaries we are interested in getting summary
statistics for a linear regression model:
* Standard deviation of estimated parameters
* t-statistic and p-value for testing if estimated parameters may be equal
to zero
* R-squared and Adjusted R-squared
Also possibly:
* AIC
* Leverage
We will be using SVD to solve the OLS and estimate above statistics.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/falaki/spark LLS
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1560.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1560
----
commit 2715c40769604d2327d4845044b5b3a15b7a7acb
Author: Hossein <[email protected]>
Date: 2014-07-23T22:20:15Z
First pass of OLS solution using SVD
commit f3454f9af9e881bd1d78f83797127ae3bcffa540
Author: Hossein <[email protected]>
Date: 2014-07-24T02:09:25Z
Added TODO
commit eb3995d05d860d601aa2c2502b4f1a7a5d119818
Author: Hossein <[email protected]>
Date: 2014-07-24T02:09:48Z
Fixed tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---