On 2/16/11 3:55 PM, "Douglas Bates" <ba...@stat.wisc.edu> wrote: >I decided that it would be simple enough to write a small package to >show exactly how you would pass a dgCMatrix to an Rcpp-based C++ >function and create a smat object.
Hi Doug, did you ever get a test case working for this? As a comparison, here's some code that calls SVDLIBC's 'svd' executable (which I put into /usr/local/bin) successfully on a sparse matrix of class dgCMatrix. This takes 4 CPU seconds to accomplish, with only 0.132 seconds of that time actually computing the SVD. The rest is reading & writing the matrix files. ============================================= write.sparse <- function (m, to) { ## Writes in a format that SVDLIBC can read stopifnot(inherits(m, "dgCMatrix")) fh <- file(to, open="w") wl <- function(...) cat(..., "\n", file=fh) ## header wl(dim(m), length(m@x)) globalCount <- 1 nper <- diff(m@p) for(i in 1:ncol(m)) { wl(nper[i]) ## Number of entries in this column if (nper[i]==0) next for(j in 1:nper[i]) { wl(m@i[globalCount], m@x[m@p[i]+j]) globalCount <- globalCount+1 } } } my.svd <- function(x, nu, nv) { stopifnot(nu==nv) write.sparse(x, "/tmp/sparse.m") rc <- system(paste("/usr/local/bin/svd -o /tmp/sout -d", nu, "/tmp/sparse.m")) if (rc != 0) stop("Couldn't run external svd code") d <- scan("/tmp/sout-S", skip=1) ut <- read.table("/tmp/sout-Ut", skip=1) vt <- read.table("/tmp/sout-Vt", skip=1) list(d=d, u=t(ut), v=t(vt)) } set.seed(123) A <- sparseMatrix(i=sample(10000, 1000), j=sample(12000, 1000), x=runif(1000), dims=c(10000, 12000)) res <- my.svd(A, 3, 3) # Loading the matrix... # Computing the SVD... # SOLVING THE [A^TA] EIGENPROBLEM # NO. OF ROWS = 10000 # NO. OF COLUMNS = 12000 # NO. OF NON-ZERO VALUES = 1000 # MATRIX DENSITY = 0.00% # MAX. NO. OF LANCZOS STEPS = 10000 # MAX. NO. OF EIGENPAIRS = 3 # LEFT END OF THE INTERVAL = -1.00E-30 # RIGHT END OF THE INTERVAL = 1.00E-30 # KAPPA = 1.00E-06 # # TRANSPOSING THE MATRIX FOR SPEED # NUMBER OF LANCZOS STEPS = 169 # RITZ VALUES STABILIZED = 4 # SINGULAR VALUES FOUND = 3 # # ELAPSED CPU TIME = 0.132 sec. # MULTIPLICATIONS BY A = 176 # MULTIPLICATIONS BY A^T = 173 res$d [1] 0.999651 0.999274 0.997689 dim(res$u) [1] 10000 3 dim(res$v) [1] 12000 3 ============================================= -- Ken Williams Senior Research Scientist Thomson Reuters Phone: 651-848-7712 ken.willi...@thomsonreuters.com http://labs.thomsonreuters.com > The package is at >http://www.stat.wisc.edu/~bates/spSVD_1.0.tar.gz It manages to pass >the values from the dgCMatrix object to the smat struct, although this >was somewhat more difficult than I had anticipated because the smat >structure uses long int and not int values. So if you look at SVD.cpp >in that package you will see that I allocate std::vector<long> objects >then copy the contents of the Rcpp::IntegerVector objects into them. > >There was one more "gotcha". There is a C function called "machar" >defined in the las2.c file and there is one in the R API. The one in >the R API wins because R is loaded before the dll for the package and >the calls are not compatible. I changed the name of the one in las2.c >to svd_machar. I'm currently running a test case > >> library(Matrix) >Loading required package: lattice > >Attaching package: 'Matrix' > >The following object(s) are masked from 'package:base': > > det > >> library(spSVD) >Loading required package: Rcpp >Warning message: >In loadNamespace(package, c(which.lib.loc, lib.loc), keep.source = >keep.source) : > package 'spSVD' contains no R code >> data(KNex) >> str(KNex$mm) >Formal class 'dgCMatrix' [package "Matrix"] with 6 slots > ..@ i : int [1:8755] 0 2 25 27 163 165 1258 1261 1276 1278 ... > ..@ p : int [1:713] 0 13 17 26 38 43 52 56 61 67 ... > ..@ Dim : int [1:2] 1850 712 > ..@ Dimnames:List of 2 > .. ..$ : NULL > .. ..$ : NULL > ..@ x : num [1:8755] 0.277 0.277 0.277 0.277 0.277 ... > ..@ factors : list() >> .Call("spSVD", KNex$mm, 713L) >SOLVING THE [A^TA] EIGENPROBLEM >NO. OF ROWS = 1850 >NO. OF COLUMNS = 712 >NO. OF NON-ZERO VALUES = 8755 >MATRIX DENSITY = 0.66% >MAX. NO. OF LANCZOS STEPS = 712 >MAX. NO. OF EIGENPAIRS = 712 >LEFT END OF THE INTERVAL = -1.00E-30 >RIGHT END OF THE INTERVAL = 1.00E-30 >KAPPA = 1.00E-06 > >but it is taking a long time. I don't know if it is a programming >error or the wrong calling sequence or just a slow algorithm but it >has been chugging away for 7 minutes. In constrast, it takes about 2 >seconds to calculate the singular values of that matrix as a dense >matrix > >> data(KNex) >> mm1 <- as(KNex$mm, "matrix") >> str(mm1) > num [1:1850, 1:712] 0.277 0 0.277 0 0 ... > - attr(*, "dimnames")=List of 2 > ..$ : NULL > ..$ : NULL > >> system.time(dsvd <- La.svd(mm1, nu=0, nv=0)) > user system elapsed > 2.280 0.030 2.664 > _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel