[R] Quickest way to match two vectors besides %in%?
Hello list, I have two data frames, X (48469,2) and Y (79771,5). X[,1] contains distinct values of Y[,2]. I want to match values in X[,1] and Y[,2], then take the corresponding value in [X,2] and place it in Y[,4]. So far I have been doing it like so: for(i in 1:48469) { y[which(x[i,1]==y[,3]),4]-x[i,2] } But it chunks along so very slowly that I can't help but wonder if there's a faster way, mainly because on my box it takes R about 30 seconds to simply COUNT to 48,469 in the for loop. I have already tried using %in%. It tells me if the values in X[,1] are IN Y[,2], which is useful in removing unnecessary values from X[,1]. But it does not tell me exactly where they match. which(X[,1] %in% Y[,2]) does but it only matches on the first instance. This is the slowest part of the script I'm working on--if I could improve it I could shave off some serious operating time. Any pointers? Regards, Pete __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Quickest way to match two vectors besides %in%?
On 11/8/2005 2:28 PM, Pete Cap wrote: Hello list, I have two data frames, X (48469,2) and Y (79771,5). X[,1] contains distinct values of Y[,2]. I want to match values in X[,1] and Y[,2], then take the corresponding value in [X,2] and place it in Y[,4]. So far I have been doing it like so: for(i in 1:48469) { y[which(x[i,1]==y[,3]),4]-x[i,2] } But it chunks along so very slowly that I can't help but wonder if there's a faster way, mainly because on my box it takes R about 30 seconds to simply COUNT to 48,469 in the for loop. I have already tried using %in%. It tells me if the values in X[,1] are IN Y[,2], which is useful in removing unnecessary values from X[,1]. But it does not tell me exactly where they match. which(X[,1] %in% Y[,2]) does but it only matches on the first instance. This is the slowest part of the script I'm working on--if I could improve it I could shave off some serious operating time. Any pointers? Look at the merge() function to add the X and Y columns to a new dataframe, then process that to merge the X[,2] and Y[,4] values. It will be something like Z - merge(X, Y, by.x=1, by.y=2, all.y=TRUE) changes - !is.na(Z[,2]) Z[changes,5] - Z[changes,2] but you are almost certainly better off (from a maintenance point of view) to use the names of the columns, rather than guessing at column numbers. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Quickest way to match two vectors besides %in%?
?match x X1 X2 1 1 5 2 2 6 3 3 7 4 4 8 y Y1 Y4 1 1 8 2 2 9 3 3 10 4 4 11 5 1 12 6 2 13 7 3 14 8 4 15 y.orig-y # backup y$Y4-x$X2[match(y$Y1, x$X1)] y Y1 Y4 1 1 5 2 2 6 3 3 7 4 4 8 5 1 5 6 2 6 7 3 7 8 4 8 HTH, Weiwei On 11/8/05, Pete Cap [EMAIL PROTECTED] wrote: Hello list, I have two data frames, X (48469,2) and Y (79771,5). X[,1] contains distinct values of Y[,2]. I want to match values in X[,1] and Y[,2], then take the corresponding value in [X,2] and place it in Y[,4]. So far I have been doing it like so: for(i in 1:48469) { y[which(x[i,1]==y[,3]),4]-x[i,2] } But it chunks along so very slowly that I can't help but wonder if there's a faster way, mainly because on my box it takes R about 30 seconds to simply COUNT to 48,469 in the for loop. I have already tried using %in%. It tells me if the values in X[,1] are IN Y[,2], which is useful in removing unnecessary values from X[,1]. But it does not tell me exactly where they match. which(X[,1] %in% Y[,2]) does but it only matches on the first instance. This is the slowest part of the script I'm working on--if I could improve it I could shave off some serious operating time. Any pointers? Regards, Pete __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Quickest way to match two vectors besides %in%?
Pete Cap wrote: Hello list, I have two data frames, X (48469,2) and Y (79771,5). X[,1] contains distinct values of Y[,2]. I want to match values in X[,1] and Y[,2], then take the corresponding value in [X,2] and place it in Y[,4]. So far I have been doing it like so: for(i in 1:48469) { y[which(x[i,1]==y[,3]),4]-x[i,2] } I'm not sure but isn't that a case where merge() can help? cheers __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html