> From: Sundar Dorai-Raj > > Liaw, Andy wrote: > > > Dear R-help, > > > > Let's say `x1' and `x2' are very long vectors (length=5e5, > say) with same > > set of names but in different order. If I want to sort > `x2' in the order of > > `x1', I would do > > > > x2[names(x1)] > > > > but the amount of time that takes is quite prohibitive! > Does anyone have > > any suggestion on a more efficient way to do this? > > > > If the two vectors are exactly the same length (as I said > above), sorting > > both by names would probably be the fastest. However, if > the two vectors > > differ in length (and the names for the shorter one are a > subset of names of > > the longer one) then that doesn't work... > > > > Best, > > Andy > > Hi Andy, > > Using match seems to be *much* faster: > > R> x1 <- 1:10000; names(x1) <- 1:10000 > R> x2 <- 1:10000; names(x2) <- 10000:1 > R> system.time(x3 <- x1[names(x2)]) > [1] 1.88 0.00 1.88 NA NA > R> system.time(x4 <- x1[match(names(x1), names(x2))]) > [1] 0.01 0.00 0.01 NA NA > R> all.equal(x3, x4) > [1] TRUE > R> > > This should also work if x1 and x2 are of diffent lengths. > > --sundar
Sundar, Thanks very much for the tip! However, I think the arguments in match() is backward: > n = 1e4 > x1 = sample(n) > x2 = sample(n) > names(x1) = sample(n) > names(x2) = sample(n) > system.time(x3 <- x1[names(x2)]) [1] 5.71 0.00 6.02 NA NA > system.time(x4 <- x1[match(names(x1),names(x2))]) [1] 0.03 0.00 0.03 NA NA > all.equal(x3, x4) [1] "Names: 9997 string mismatches" "Mean relative difference: 0.669837" > names(x3[1:5]) [1] "5391" "9927" "6499" "1863" "8287" > names(x4[1:5]) [1] "2560" "9914" "6348" "1291" "5718" > system.time(x4 <- x1[match(names(x2),names(x1))]) [1] 0.03 0.00 0.03 NA NA > names(x4[1:5]) [1] "5391" "9927" "6499" "1863" "8287" > all.equal(x3, x4) [1] TRUE [Admittedly this is why I rarely use match(): I get mixed up easily.] Reid: It isn't a memory problem. For vectors of length 6e5, I killed the R process after more than 5 hours on an Opteron 248. The R process was taking up about 114MB of RAM, out of 8GB in the box. I'm rather surprised that such seemingly simple operation would take so long, especially when sorting such vectors is very fast. What am I missing? Best, Andy ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
