Hello Alex,
Thanks for the response. There isn't much other data on the driver, so the
issue is probably inherent to this particular PCA implementation. I'll try
the alternative approach that you suggested instead. Thanks again.
-Bharath
On Wed, Jan 13, 2016 at 11:24 PM, Alex Gittens wrote:
>
The PCA.fit function calls the RowMatrix PCA routine, which attempts to
construct the covariance matrix locally on the driver, and then computes
the SVD of that to get the PCs. I'm not sure what's causing the memory
error: RowMatrix.scala:124 is only using 3.5 GB of memory (n*(n+1)/2 with
n=29604 a
Any suggestion/opinion?
On 12-Jan-2016 2:06 pm, "Bharath Ravi Kumar" wrote:
> We're running PCA (selecting 100 principal components) on a dataset that
> has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The
> matrix in question is mostly sparse with tens of columns populate in mos
We're running PCA (selecting 100 principal components) on a dataset that
has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The
matrix in question is mostly sparse with tens of columns populate in most
rows, but a few rows with thousands of columns populated. We're running
spark on m