I need to sum the columns of a sparse matrix according to a factor -
ie given a sparse matrix X and a factor fac of length ncol(X), sum
the elements by column factors and return the sparse matrix Y of size
nrow(X) by nlevels(f). The appended code does the job, but is
unacceptably memory-bound because tapply() uses a non-sparse
representation. Can anyone suggest a more memory and cpu efficient
approach? Eg, a sparse matrix tapply method? Thanks.
--
+--------------------------------------------------------------+
| Jon Stearley (505) 845-7571 (FAX 844-9297) |
| Sandia National Laboratories Scalable Systems Integration |
+--------------------------------------------------------------+
# x and y are of SparseM class matrix.csr
"aggregate.csr" <-
function(x, fac) {
# make a vector indicating the row of each nonzero
rows <- integer(length=length([EMAIL PROTECTED]))
[EMAIL PROTECTED]:nrow(x)]] <- 1 # put a 1 at start of each row
rows <- as.integer(cumsum(rows)) # and finish with a cumsum
# make a vector indicating the column factor of each nonzero
f <- [EMAIL PROTECTED]
# aggregate by row,f
y <- tapply([EMAIL PROTECTED], list(rows,f), sum)
# sparsify it
y[is.na(y)] <- 0 # change tapply NAs to as.matrix.csr 0s
y <- as.matrix.csr(y)
y
}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.