GitHub user sandecho opened a pull request:
https://github.com/apache/spark/pull/20708
[SPARK-21209][MLLLIB] Implement Incremental PCA algorithm
## What changes were proposed in this pull request?
A new feature called Incremental Principal Component Analysis
Algorithm(IPCA) has been proposed. It divides the incoming data in batch size
and compute the PCA of the individual batch to generate Principal Component of
entire data.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
Unit Testing
[IPCA.zip](https://github.com/apache/spark/files/1772562/IPCA.zip)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sandecho/spark IPCA
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20708.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20708
----
commit 7900d21138de542fd89763a68417d74792725afd
Author: Sandeep Kumar Choudhary <tssandeepkumarchoudhary@...>
Date: 2018-03-01T13:35:20Z
Implemented Incremental PCA
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]