Re: [Rd] svd for Large Matrix

2021-08-16 Thread Avraham Adler
If you’re crossproding X by itself, I think passing symmetric = TRUE to
eigen will eke out more speed.

Avi

On Mon, Aug 16, 2021 at 6:30 PM Radford Neal  wrote:

> > Dario Strbenac  writes:
> >
> > I have a real scenario involving 45 million biological cells
> > (samples) and 60 proteins (variables) which leads to a segmentation
> > fault for svd. I thought this might be a good example of why it
> > might benefit from a long vector upgrade.
>
> Rather than the full SVD of a 4500x60 X, my guess is that you
> may really only be interested in the eigenvalues and eigenvectors of
> X^T X, in which case eigen(t(X)%*%X) would probably be much faster.
> (And eigen(crossprod(X)) would be even faster.)
>
> Note that if you instead want the eigenvalues and eigenvectors of
> X X^T (which is an enormous matrix), the eigenvalues of this are the
> same as those of X^T X, and the eigenvectors are Xv, where v is an
> eigenvector of X^T X.
>
> For example, with R 4.0.2, and the reference BLAS/LAPACK, I get
>
>   > X<-matrix(rnorm(10),1,10)
>   > system.time(for(i in 1:1000) rs<-svd(X))
>  user  system elapsed
> 2.393   0.008   2.403
>   > system.time(for(i in 1:1000) re<-eigen(crossprod(X)))
>  user  system elapsed
> 0.609   0.000   0.609
>   > rs$d^2
>[1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566
> 9931.538
>[8]  9813.841  9703.818  9598.532
>   > re$values
>[1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566
> 9931.538
>[8]  9813.841  9703.818  9598.532
>
> Possibly some other LAPACK might implement svd better, though I
> suspect that R will allocate more big matrices than really necessary
> for the svd even aside from whatever LAPACK is doing.
>
> Regards,
>
>Radford Neal
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
-- 
Sent from Gmail Mobile

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] svd for Large Matrix

2021-08-16 Thread Radford Neal
> Dario Strbenac  writes:
>
> I have a real scenario involving 45 million biological cells
> (samples) and 60 proteins (variables) which leads to a segmentation
> fault for svd. I thought this might be a good example of why it
> might benefit from a long vector upgrade.

Rather than the full SVD of a 4500x60 X, my guess is that you
may really only be interested in the eigenvalues and eigenvectors of
X^T X, in which case eigen(t(X)%*%X) would probably be much faster.
(And eigen(crossprod(X)) would be even faster.)

Note that if you instead want the eigenvalues and eigenvectors of
X X^T (which is an enormous matrix), the eigenvalues of this are the
same as those of X^T X, and the eigenvectors are Xv, where v is an
eigenvector of X^T X.

For example, with R 4.0.2, and the reference BLAS/LAPACK, I get

  > X<-matrix(rnorm(10),1,10)
  > system.time(for(i in 1:1000) rs<-svd(X))
 user  system elapsed
2.393   0.008   2.403
  > system.time(for(i in 1:1000) re<-eigen(crossprod(X)))
 user  system elapsed
0.609   0.000   0.609
  > rs$d^2
   [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566  9931.538
   [8]  9813.841  9703.818  9598.532
  > re$values
   [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566  9931.538
   [8]  9813.841  9703.818  9598.532

Possibly some other LAPACK might implement svd better, though I
suspect that R will allocate more big matrices than really necessary
for the svd even aside from whatever LAPACK is doing.

Regards,

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] svd For Large Matrix

2021-08-13 Thread Dario Strbenac via R-devel
Good day,

I have a real scenario involving 45 million biological cells (samples) and 60 
proteins (variables) which leads to a segmentation fault for svd. I thought 
this might be a good example of why it might benefit from a long vector upgrade.

test <- matrix(rnorm(4500*60), ncol = 60)
testSVD <- svd(test)

 *** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'

Traceback:
 1: La.svd(x, nu, nv)
 2: svd(test)

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel