[GitHub] incubator-spark pull request: Principal Component Analysis

2014-02-09 Thread sscdotopen
Github user sscdotopen commented on the pull request: https://github.com/apache/incubator-spark/pull/564#issuecomment-34583075 Centering a sparse matrix like it is done in textbook PCA is a serious scalability bottleneck as it densifies the input matrix, not sure if you can apply the

[GitHub] incubator-spark pull request: [Proposal] Adding sparse data suppor...

2014-02-10 Thread sscdotopen
Github user sscdotopen commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-34674441 I think making making the heavyweight mahout-core a dependency just for access to the sparse vectors is no good idea. A better way would be to just depend on