I've noticed inconsistent behavior with merge() when using all.x=TRUE. After some digging I found the following test cases: 1) The snippet below doesn't work as expected, as the non-matching columns of rows in a but not b take the value from the first matching row instead of being NA: --- Snip >>> NUM<-25; a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM)) b <- data.frame(id=c("e","a","f","y","x"))
b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14 b$nn <- rep("from b", 5) merge(a,b,by="id",all.x=TRUE) <<< Snip --- 2) The modified snippet below works as expected: --- Snip >>> NUM<-25; a <- data.frame(id=factor(letters[1:NUM]), qq=rep(NA, NUM), rr=rep(1.0,NUM)) b <- data.frame(id=c("e","a","f","y","x")) b$nn <- rep("from b", 5) b$mm <- as.vector(c(1,2,3.1,4.0,NA))%o%3.14 merge(a,b,by="id",all.x=TRUE) <<< Snip --- In src/library/base/R/merge.R:154, I see the following: --- Snip >>> for(i in seq_along(y)) { ## do it this way to invoke methods for e.g. factor if(is.matrix(y[[1]])) y[[1]][zap, ] <- NA else is.na(y[[i]]) <- zap } <<< Snip --- Changing the '1's in the if statement to 'i's fixes this issue for me, i.e.: --- Snip >>> for(i in seq_along(y)) { ## do it this way to invoke methods for e.g. factor if(is.matrix(y[[i]])) y[[i]][zap, ] <- NA else is.na(y[[i]]) <- zap } <<< Snip --- I'm actually not sure if the "if statement" is even needed (the "else" case seems to handle matrices just fine). --Russ Hamilton ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel