Hi Xiangrui,
Thanks a lot for you answer.
So I fixed my Julia code, also calculated PCA using R as well.
R Code:
-
data - read.csv('/home/upul/Desktop/iris.csv');
X - data[,1:4]
pca - prcomp(X, center = TRUE, scale=FALSE)
transformed - predict(pca, newdata = X)
Julia Code (Fixed)
--
data = readcsv(/home/upul/temp/iris.csv);
X = data[:,1:end-1];
meanX = mean(X,1);
m,n = size(X);
X = X - repmat(x, m,1);
u,s,v = svd(X);
transformed = X*v;
Now PCA calculated using Julia and R is identical, but still I can see a
small
difference between PCA values given by Spark and other two.
Thanks,
Upul
On Sat, Jan 10, 2015 at 11:17 AM, Xiangrui Meng men...@gmail.com wrote:
You need to subtract mean values to obtain the covariance matrix
(http://en.wikipedia.org/wiki/Covariance_matrix).
On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara upulband...@gmail.com
wrote:
Hi Xiangrui,
Thanks for the reply.
Julia code is also using the covariance matrix:
(1/n)*X'*X ;
Thanks,
Upul
On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng men...@gmail.com wrote:
The Julia code is computing the SVD of the Gram matrix. PCA should be
applied to the covariance matrix. -Xiangrui
On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara upulband...@gmail.com
wrote:
Hi All,
I tried to do PCA for the Iris dataset
[https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
[
http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html].
Also, PCA was calculated in Julia using following method:
Sigma = (1/numRow(X))*X'*X ;
[U, S, V] = svd(Sigma);
Ureduced = U(:, 1:k);
Z = X*Ureduced;
However, I'm seeing a little difference between values given by MLLib
and
the method shown above .
Does anyone have any idea about this difference?
Additionally, I have attached two visualizations, related to two
approaches.
Thanks,
Upul
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org