Could you compare V directly and tell us more about the difference you saw? The column of V should be the same subject to signs. For example, the first column of V could be either [0.8, -0.6, 0.0] or [-0.8, 0.6, 0.0]. -Xiangrui
On Sat, Jan 10, 2015 at 8:08 PM, Upul Bandara <upulband...@gmail.com> wrote: > Hi Xiangrui, > > Thanks a lot for you answer. > So I fixed my Julia code, also calculated PCA using R as well. > > R Code: > ------------- > data <- read.csv('/home/upul/Desktop/iris.csv'); > X <- data[,1:4] > pca <- prcomp(X, center = TRUE, scale=FALSE) > transformed <- predict(pca, newdata = X) > > Julia Code (Fixed) > -------------- > data = readcsv("/home/upul/temp/iris.csv"); > X = data[:,1:end-1]; > meanX = mean(X,1); > m,n = size(X); > X = X - repmat(x, m,1); > u,s,v = svd(X); > transformed = X*v; > > Now PCA calculated using Julia and R is identical, but still I can see a > small > difference between PCA values given by Spark and other two. > > Thanks, > Upul > > On Sat, Jan 10, 2015 at 11:17 AM, Xiangrui Meng <men...@gmail.com> wrote: >> >> You need to subtract mean values to obtain the covariance matrix >> (http://en.wikipedia.org/wiki/Covariance_matrix). >> >> On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara <upulband...@gmail.com> >> wrote: >> > Hi Xiangrui, >> > >> > Thanks for the reply. >> > >> > Julia code is also using the covariance matrix: >> > (1/n)*X'*X ; >> > >> > Thanks, >> > Upul >> > >> > On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng <men...@gmail.com> wrote: >> >> >> >> The Julia code is computing the SVD of the Gram matrix. PCA should be >> >> applied to the covariance matrix. -Xiangrui >> >> >> >> On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulband...@gmail.com> >> >> wrote: >> >> > Hi All, >> >> > >> >> > I tried to do PCA for the Iris dataset >> >> > [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib >> >> > >> >> > >> >> > [http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html]. >> >> > Also, PCA was calculated in Julia using following method: >> >> > >> >> > Sigma = (1/numRow(X))*X'*X ; >> >> > [U, S, V] = svd(Sigma); >> >> > Ureduced = U(:, 1:k); >> >> > Z = X*Ureduced; >> >> > >> >> > However, I'm seeing a little difference between values given by MLLib >> >> > and >> >> > the method shown above . >> >> > >> >> > Does anyone have any idea about this difference? >> >> > >> >> > Additionally, I have attached two visualizations, related to two >> >> > approaches. >> >> > >> >> > Thanks, >> >> > Upul >> >> > >> >> > >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> > For additional commands, e-mail: user-h...@spark.apache.org >> > >> > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org