So yes there is a difference between the two depending on the size of the matrix.
Following is an output from ipython: *With a matrix of shape (1000 * 500)* (batman3) tupui@Batman:Desktop $ ipython -i sk_pod.py Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 13:44:09) Type 'copyright', 'credits' or 'license' for more information IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: %timeit pod._update(snapshot2.T) 491 ms ± 22.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [2]: %timeit ipca.partial_fit(snapshot2) 163 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) *With a matrix of shape (1000 * 2000)* (batman3) tupui@Batman:Desktop $ ipython -i sk_pod.py Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 13:44:09) Type 'copyright', 'credits' or 'license' for more information IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: %timeit pod._update(snapshot2.T) 4.84 s ± 220 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [2]: %timeit ipca.partial_fit(snapshot2) 5.85 s ± 77.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [3]: Do you really want to exit ([y]/n)? *With a matrix of shape (1000 * 20 000)* (batman3) tupui@Batman:Desktop $ ipython -i sk_pod.py Python 3.6.5 | packaged by conda-forge | (default, Apr 6 2018, 13:44:09) Type 'copyright', 'credits' or 'license' for more information IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: %timeit pod._update(snapshot2.T) 3.39 s ± 65.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [2]: %timeit ipca.partial_fit(snapshot2) 33.1 s ± 17.7 s per loop (mean ± std. dev. of 7 runs, 1 loop each) Conclusion is that, the method seems faster to add one sample if the number of feature is superior to the number of samples. But if you want to add a bunch of sample, I found that sklearn seems a bit faster (38.75 s vs 34.51s to add 10 samples of shape 1000 * 20 000). It is to be noted that in this last case, adding a single or 10 samples is taking the same time ~30s. So depending on how much sample are to be added, this can help. Cheers, Pamphile P.S. Following is the code I used (requires batman available though conda-forge): import time import numpy as np from batman.pod import Pod from sklearn.decomposition import IncrementalPCA n_samples, n_features = 1000, 20000 snapshots = np.random.random_sample((n_samples, n_features)) snapshot2 = np.random.random_sample((1, n_features)) pod = Pod([np.zeros(n_features), np.ones(n_features)], None, np.inf, 1, 999) pod._decompose(snapshots.T) ipca = IncrementalPCA(999) ipca.fit(snapshots) np.allclose(ipca.singular_values_, pod.S) pod._update(snapshot2.T) ipca.partial_fit(snapshot2) np.allclose(ipca.singular_values_[:999], pod.S[:999]) snapshot3 = np.random.random_sample((10, n_features)) itime = time.time() [pod._update(snap.T[:, None]) for snap in snapshot3] print(time.time() - itime) itime = time.time() ipca.partial_fit(snapshot3) print(time.time() - itime) np.allclose(ipca.singular_values_[:999], pod.S[:999]) 2018-07-03 11:06 GMT+02:00 Pamphile Roy <roy.pamph...@gmail.com>: > I have no idea about the comparison with sklearn.decomposition.Inc > rementalPCA. > Was not aware of this but from the code it seems to be a different > approach. > I will try to come with some numbers. > > Pamphile >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn