Re: Discrepancy in PCA values

2015-01-10 Thread Upul Bandara
Hi Xiangrui,

Thanks a lot for you answer.
So I fixed my Julia code, also calculated PCA using R as well.

R Code:
-
data - read.csv('/home/upul/Desktop/iris.csv');
X - data[,1:4]
pca - prcomp(X, center = TRUE, scale=FALSE)
transformed - predict(pca, newdata = X)

Julia Code (Fixed)
--
data = readcsv(/home/upul/temp/iris.csv);
X = data[:,1:end-1];
meanX = mean(X,1);
m,n = size(X);
X = X - repmat(x, m,1);
u,s,v = svd(X);
transformed =  X*v;

Now PCA calculated using Julia and R is identical, but still I can see a
small
difference between PCA  values given by Spark and other two.

Thanks,
Upul

On Sat, Jan 10, 2015 at 11:17 AM, Xiangrui Meng men...@gmail.com wrote:

 You need to subtract mean values to obtain the covariance matrix
 (http://en.wikipedia.org/wiki/Covariance_matrix).

 On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara upulband...@gmail.com
 wrote:
  Hi Xiangrui,
 
  Thanks for the reply.
 
  Julia code is also using the covariance matrix:
  (1/n)*X'*X ;
 
  Thanks,
  Upul
 
  On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng men...@gmail.com wrote:
 
  The Julia code is computing the SVD of the Gram matrix. PCA should be
  applied to the covariance matrix. -Xiangrui
 
  On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara upulband...@gmail.com
  wrote:
   Hi All,
  
   I tried to do PCA for the Iris dataset
   [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
  
   [
 http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html].
   Also, PCA  was calculated in Julia using following method:
  
   Sigma = (1/numRow(X))*X'*X ;
   [U, S, V] = svd(Sigma);
   Ureduced = U(:, 1:k);
   Z = X*Ureduced;
  
   However, I'm seeing a little difference between values given by MLLib
   and
   the method shown above .
  
   Does anyone have any idea about this difference?
  
   Additionally, I have attached two visualizations, related to two
   approaches.
  
   Thanks,
   Upul
  
  
  
   -
   To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
   For additional commands, e-mail: user-h...@spark.apache.org
 
 



Re: Discrepancy in PCA values

2015-01-09 Thread Upul Bandara
Hi Xiangrui,

Thanks for the reply.

Julia code is also using the covariance matrix:
(1/n)*X'*X ;

Thanks,
Upul

On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng men...@gmail.com wrote:

 The Julia code is computing the SVD of the Gram matrix. PCA should be
 applied to the covariance matrix. -Xiangrui

 On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara upulband...@gmail.com
 wrote:
  Hi All,
 
  I tried to do PCA for the Iris dataset
  [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
  [http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html
 ].
  Also, PCA  was calculated in Julia using following method:
 
  Sigma = (1/numRow(X))*X'*X ;
  [U, S, V] = svd(Sigma);
  Ureduced = U(:, 1:k);
  Z = X*Ureduced;
 
  However, I'm seeing a little difference between values given by MLLib and
  the method shown above .
 
  Does anyone have any idea about this difference?
 
  Additionally, I have attached two visualizations, related to two
 approaches.
 
  Thanks,
  Upul
 
 
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org