Re: Discrepancy in PCA values

Xiangrui Meng Mon, 12 Jan 2015 14:01:14 -0800

Could you compare V directly and tell us more about the difference you
saw? The column of V should be the same subject to signs. For example,
the first column of V could be either [0.8, -0.6, 0.0] or [-0.8, 0.6,
0.0]. -Xiangrui


On Sat, Jan 10, 2015 at 8:08 PM, Upul Bandara <upulband...@gmail.com> wrote:
> Hi Xiangrui,
>
> Thanks a lot for you answer.
> So I fixed my Julia code, also calculated PCA using R as well.
>
> R Code:
> -------------
> data <- read.csv('/home/upul/Desktop/iris.csv');
> X <- data[,1:4]
> pca <- prcomp(X, center = TRUE, scale=FALSE)
> transformed <- predict(pca, newdata = X)
>
> Julia Code (Fixed)
> --------------
> data = readcsv("/home/upul/temp/iris.csv");
> X = data[:,1:end-1];
> meanX = mean(X,1);
> m,n = size(X);
> X = X - repmat(x, m,1);
> u,s,v = svd(X);
> transformed =  X*v;
>
> Now PCA calculated using Julia and R is identical, but still I can see a
> small
> difference between PCA  values given by Spark and other two.
>
> Thanks,
> Upul
>
> On Sat, Jan 10, 2015 at 11:17 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>> You need to subtract mean values to obtain the covariance matrix
>> (http://en.wikipedia.org/wiki/Covariance_matrix).
>>
>> On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara <upulband...@gmail.com>
>> wrote:
>> > Hi Xiangrui,
>> >
>> > Thanks for the reply.
>> >
>> > Julia code is also using the covariance matrix:
>> > (1/n)*X'*X ;
>> >
>> > Thanks,
>> > Upul
>> >
>> > On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng <men...@gmail.com> wrote:
>> >>
>> >> The Julia code is computing the SVD of the Gram matrix. PCA should be
>> >> applied to the covariance matrix. -Xiangrui
>> >>
>> >> On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulband...@gmail.com>
>> >> wrote:
>> >> > Hi All,
>> >> >
>> >> > I tried to do PCA for the Iris dataset
>> >> > [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
>> >> >
>> >> >
>> >> > [http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html].
>> >> > Also, PCA  was calculated in Julia using following method:
>> >> >
>> >> > Sigma = (1/numRow(X))*X'*X ;
>> >> > [U, S, V] = svd(Sigma);
>> >> > Ureduced = U(:, 1:k);
>> >> > Z = X*Ureduced;
>> >> >
>> >> > However, I'm seeing a little difference between values given by MLLib
>> >> > and
>> >> > the method shown above .
>> >> >
>> >> > Does anyone have any idea about this difference?
>> >> >
>> >> > Additionally, I have attached two visualizations, related to two
>> >> > approaches.
>> >> >
>> >> > Thanks,
>> >> > Upul
>> >> >
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Discrepancy in PCA values

Reply via email to