Re: [Rd] svd for Large Matrix
If you’re crossproding X by itself, I think passing symmetric = TRUE to eigen will eke out more speed. Avi On Mon, Aug 16, 2021 at 6:30 PM Radford Neal wrote: > > Dario Strbenac writes: > > > > I have a real scenario involving 45 million biological cells > > (samples) and 60 proteins (variables) which leads to a segmentation > > fault for svd. I thought this might be a good example of why it > > might benefit from a long vector upgrade. > > Rather than the full SVD of a 4500x60 X, my guess is that you > may really only be interested in the eigenvalues and eigenvectors of > X^T X, in which case eigen(t(X)%*%X) would probably be much faster. > (And eigen(crossprod(X)) would be even faster.) > > Note that if you instead want the eigenvalues and eigenvectors of > X X^T (which is an enormous matrix), the eigenvalues of this are the > same as those of X^T X, and the eigenvectors are Xv, where v is an > eigenvector of X^T X. > > For example, with R 4.0.2, and the reference BLAS/LAPACK, I get > > > X<-matrix(rnorm(10),1,10) > > system.time(for(i in 1:1000) rs<-svd(X)) > user system elapsed > 2.393 0.008 2.403 > > system.time(for(i in 1:1000) re<-eigen(crossprod(X))) > user system elapsed > 0.609 0.000 0.609 > > rs$d^2 >[1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566 > 9931.538 >[8] 9813.841 9703.818 9598.532 > > re$values >[1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566 > 9931.538 >[8] 9813.841 9703.818 9598.532 > > Possibly some other LAPACK might implement svd better, though I > suspect that R will allocate more big matrices than really necessary > for the svd even aside from whatever LAPACK is doing. > > Regards, > >Radford Neal > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Sent from Gmail Mobile [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] svd for Large Matrix
> Dario Strbenac writes: > > I have a real scenario involving 45 million biological cells > (samples) and 60 proteins (variables) which leads to a segmentation > fault for svd. I thought this might be a good example of why it > might benefit from a long vector upgrade. Rather than the full SVD of a 4500x60 X, my guess is that you may really only be interested in the eigenvalues and eigenvectors of X^T X, in which case eigen(t(X)%*%X) would probably be much faster. (And eigen(crossprod(X)) would be even faster.) Note that if you instead want the eigenvalues and eigenvectors of X X^T (which is an enormous matrix), the eigenvalues of this are the same as those of X^T X, and the eigenvectors are Xv, where v is an eigenvector of X^T X. For example, with R 4.0.2, and the reference BLAS/LAPACK, I get > X<-matrix(rnorm(10),1,10) > system.time(for(i in 1:1000) rs<-svd(X)) user system elapsed 2.393 0.008 2.403 > system.time(for(i in 1:1000) re<-eigen(crossprod(X))) user system elapsed 0.609 0.000 0.609 > rs$d^2 [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566 9931.538 [8] 9813.841 9703.818 9598.532 > re$values [1] 10568.003 10431.864 10318.959 10219.961 10138.025 10068.566 9931.538 [8] 9813.841 9703.818 9598.532 Possibly some other LAPACK might implement svd better, though I suspect that R will allocate more big matrices than really necessary for the svd even aside from whatever LAPACK is doing. Regards, Radford Neal __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] svd For Large Matrix
Good day, I have a real scenario involving 45 million biological cells (samples) and 60 proteins (variables) which leads to a segmentation fault for svd. I thought this might be a good example of why it might benefit from a long vector upgrade. test <- matrix(rnorm(4500*60), ncol = 60) testSVD <- svd(test) *** caught segfault *** address 0x7fe93514d618, cause 'memory not mapped' Traceback: 1: La.svd(x, nu, nv) 2: svd(test) -- Dario Strbenac University of Sydney Camperdown NSW 2050 Australia __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel