[R] memory-efficient column aggregation of a sparse matrix

Jon Stearley Wed, 31 Jan 2007 18:04:05 -0800

I need to sum the columns of a sparse matrix according to a factor -  
ie given a sparse matrix X and a factor fac of length ncol(X), sum  
the elements by column factors and return the sparse matrix Y of size  
nrow(X) by nlevels(f).  The appended code does the job, but is  
unacceptably memory-bound because tapply() uses a non-sparse  
representation.  Can anyone suggest a more memory and cpu efficient  
approach?  Eg, a sparse matrix tapply method?  Thanks.


-- 
+--------------------------------------------------------------+
| Jon Stearley                  (505) 845-7571  (FAX 844-9297) |
| Sandia National Laboratories  Scalable Systems Integration   |
+--------------------------------------------------------------+


# x and y are of SparseM class matrix.csr
"aggregate.csr" <-
function(x, fac) {
         # make a vector indicating the row of each nonzero
         rows <- integer(length=length([EMAIL PROTECTED]))
         [EMAIL PROTECTED]:nrow(x)]] <- 1 # put a 1 at start of each row
         rows <- as.integer(cumsum(rows)) # and finish with a cumsum

         # make a vector indicating the column factor of each nonzero
         f <- [EMAIL PROTECTED]

         # aggregate by row,f
         y <- tapply([EMAIL PROTECTED], list(rows,f), sum)

         # sparsify it
         y[is.na(y)] <- 0  # change tapply NAs to as.matrix.csr 0s
         y <- as.matrix.csr(y)

         y
}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory-efficient column aggregation of a sparse matrix

Reply via email to