Hi David and Dimitris, Thanks for your suggestions. They are very helpful.
Jeff On Wed, November 11, 2009 12:12 pm, David Winsemius wrote: > > On Nov 11, 2009, at 10:57 AM, David Winsemius wrote: > > >> >> On Nov 11, 2009, at 10:36 AM, Dimitris Rizopoulos wrote: >> >> >>> one approach is the following: >>> >>> mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2)) >>> >>> mat / ave(abs(mat), row(mat), sign(mat), FUN = sum) >> >> Very elegant. My solution was a bit more pedestrian, but may have >> some speed advantage: >> > > > > I am wondering if there might be further performance improvements if > sums were pre-calculated before the ifelse scaling step. > > Perhaps: > >> mat <- matrix(sample(-4:4, 100, replace=T), ncol=10) >> system.time(replicate(10000, t(apply(mat, 1, function(x) {negs <- > sum(x[x<0], na.rm=T); poss <- sum(x[x>0], na.rm=T); ifelse( x <0, -x/ > negs, x/poss)} ) ) ) ) user system elapsed 9.420 0.103 9.619 > >> system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x > <0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) ) > user system elapsed 8.206 0.035 8.231 > > > That was only a 15% improvement but I got a 50% improvement by > replacing the ifelse() with its Boolean algebra equivalent: > >> t( apply(mat, 1, function(x) -x*(x <0)/sum(x[x<0], na.rm=T) + > x*(x>0)/sum(x[x>0], na.rm=T) ) ) [,1] [,2] [,3] [,4] > [1,] -0.5 -0.5 1.0000000 NA > [2,] 0.5 0.5 -0.6666667 -0.3333333 > [3,] 0.5 0.5 NA -1.0000000 > > > >> system.time(replicate(10000, t( apply(mat, 1, function(x) -x*(x > <0)/sum(x[x<0], na.rm=T) + x*(x>0)/sum(x[x>0], na.rm=T) ) ) )) > user system elapsed 4.805 0.041 4.839 > > > I could not figure out the Jeff's method of applying the two functions > he presented, so I am unable to compare any of these methods to his > strategy. > > -- > David. > >> >> >>> system.time(replicate(10000, t( apply(mat, 1, function(x) >> ifelse( x <0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) ) >> user system elapsed 5.958 0.027 5.977 >> >> >>> system.time(replicate(10000, mat / ave(abs(mat), row(mat), >> sign(mat), FUN = sum) ) ) user system elapsed 12.886 0.064 12.886 >> >> >> -- >> David >> >>> >>> >>> I hope it helps. >>> >>> >>> Best, >>> Dimitris >>> >>> >>> >>> Hao Cen wrote: >>> >>>> Hi, >>>> I have a matrix with positive numbers, negative numbers, and NAs. An >>>> example of the matrix is as follows -1 -1 2 NA >>>> 3 3 -2 -1 >>>> 1 1 NA -2 >>>> I need to compute a scaled version of this matrix. The scaling >>>> method is dividing each positive numbers in each row by the sum of >>>> positive numbers in that row and dividing each negative numbers in >>>> each row by the sum of absolute value of negative numbers in that >>>> row. So the resulting matrix would be >>>> -1/2 -1/2 2/2 NA >>>> 3/6 3/6 -2/3 -1/3 >>>> 1/2 1/2 NA -2/2 >>>> Is there an efficient way to do that in R? One way I am using is >>>> 1. rowSums for positive numbers in the matrix >>>> 2. rowSums for negative numbers in the matrix >>>> 3. sweep(mat, 1, posSumVec, posDivFun) >>>> 4. sweep(mat, 1, negSumVec, negDivFun) >>>> posDivFun = function(x,y) { xPosId = x>0 & !is.na(x) x[xPosId] = >>>> x[xPosId]/y[xPosId] return(x) } >>>> negDivFun = function(x,y) { xNegId = x<0 & !is.na(x) x[xNegId] = >>>> -x[xNegId]/y[xNegId] >>>> return(x) } >>>> It is not fast enough though. This scaling is to be applied to >>>> large data sets repetitively. I would like to make it as fast as >>>> possible. Any thoughts on improving it would be appreciated. Thanks >>>> Jeff >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Dimitris Rizopoulos >>> Assistant Professor >>> Department of Biostatistics >>> Erasmus University Medical Center >>> >>> >>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands >>> Tel: +31/(0)10/7043478 >>> Fax: +31/(0)10/7043014 >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.