You might also want to consider _partial sorting_ by using the 'partial' argument of sort(), especially when the number of data points is really large.
Since argument 'decreasing=FALSE' is not supported when using 'partial', you have to flip it yourself by negating the values, e.g. x <- rnorm(8e6); is.na(x) <- sample(length(x), size=1e6); n <- 50; t1 <- system.time({ x1 <- sort(x, decreasing=TRUE); x1h <- x1[1:n]; }); t2 <- system.time({ x2 <- sort(-x, partial=n); x2h <- -sort(x2[1:n]); }); stopifnot(identical(x2h, x1h)); print(t2/t1); user system elapsed 0.3076923 0.7777778 0.3491525 /Henrik On Fri, Jun 18, 2010 at 1:20 PM, Peter Ehlers <ehl...@ucalgary.ca> wrote: > > m <- matrix(round(rnorm(4000 * 2000), 4), nr = 4000) > is.na(m) <- sample(8e6, 1e6) > > system.time( > idx <- which( > matrix(m %in% head(sort(m, TRUE), 50), > nr = nrow(m)), arr.ind = TRUE)) > > # user system elapsed > # 3.12 0.19 3.18 > > -Peter Ehlers > > > On 2010-06-18 5:13, Dennis Murphy wrote: >> >> Hi: >> >> Here's a faked up example: >> >> a<- matrix(rnorm(4000*2000), 4000, 2000) >> # Generate some NAs in the matrix >> nr<- sample(50, 1:4000) >> nc<- sample(50, 1:2000) >> a[nr, nc]<- NA >> >> # convert to data frame: >> b<- data.frame(row = rep(1:4000, 2000), col = rep(1:2000, each = 4000), >> x = as.vector(a)) >> # relatively time consuming...about 13.5 s on my machine >> bb<- b[rev(order(b$x, na.last = FALSE)), ] >>> >>> bb[1:10, ] >> >> row col x >> 691269 3269 173 5.103704 >> 7815076 3076 1954 4.961544 >> 4999621 3621 1250 4.953265 >> 500469 469 126 4.937655 >> 5878224 2224 1470 4.929150 >> 4287270 3270 1072 4.913791 >> 4442521 2521 1111 4.896869 >> 4668867 867 1168 4.863504 >> 5716575 575 1430 4.760778 >> 3055274 3274 764 4.758995 >> >> HTH, >> Dennis >> >> >> On Thu, Jun 17, 2010 at 10:41 PM, >> uschlecht<ulrich.schle...@stanford.edu>wrote: >> >>> >>> Hi, >>> >>> I have a huge matrix (4000 * 2000 data points) and I would like to >>> retrieve >>> the coordinates (column and row) for the top 50 (or x) values. Some >>> positions in the matrix have NA as a value. These should be discarded. >>> >>> My current method is to replace all NAs by 0, then rank all the values >>> and >>> then extract the positions with the 50 highest ranks. It is very >>> time-consuming! >>> >>> Is there a simpler way to do this? >>> >>> Thank you, >>> Ulrich >>> >>> -- >>> View this message in context: >>> >>> http://r.789695.n4.nabble.com/Find-the-50-highest-values-in-a-matrix-tp2259721p2259721.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.