Hi Don, In cases like: H_DF <- H_DF[-4,] tmp1 <- match( paste(T_DF$FY,T_DF$ID) , paste(H_DF$FY,H_DF$ID) )
H_DF$TT[tmp1] <- T_DF$TT #Error in `[<-.factor`(`*tmp*`, tmp1, value = c(2L, 2L, 2L)) : Probably, this works: H_DF$TT[tmp1[!is.na(tmp1)]] <- unique(T_DF$TT) A.K. On Wednesday, November 13, 2013 7:59 PM, "MacQueen, Don" <macque...@llnl.gov> wrote: Dan, Gabor's solution is of course good, but here's a solution that uses only base R capabilities, and doesn't sort as merge() does. Essentially the same as A.K.'s, but slightly more general. tmp1 <- match( paste(T_DF$FY,T_DF$ID) , paste(H_DF$FY,H_DF$ID) ) H_DF$TT[tmp1] <- T_DF$TT gg <- sqldf("select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join T_DF t using(FY, ID)") > for (nm in names(H_DF)) print(all.equal(H_DF[[nm]], gg[[nm]])) [1] TRUE [1] TRUE [1] TRUE It could be made into a one-liner.It would probably break if TT doesn't have the same factor levels in both H_DF and T_DF. As an aside, I suspect that nowadays match() is generally under-appreciated among R users as a whole. -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 On 11/11/13 5:20 PM, "Gabor Grothendieck" <ggrothendi...@gmail.com> wrote: >On Mon, Nov 11, 2013 at 8:04 PM, Lopez, Dan <lopez...@llnl.gov> wrote: >> Below is how I am currently doing this. Is there a more efficient way >>to do this? >> The scenario is that I have two dataframes of different sizes. I need >>to update one binary factor variable in one of those dataframes by >>matching on two variables. If there is no match keep as is otherwise >>update. Also the variable being update, TT in this case should remain a >>binary factor variable (levels='HC','TER') >> >> HTDF2<-merge(H_DF,T_DF,by=c("FY","ID"),all.x=T) >> >>HTDF2$TT<-factor(ifelse(is.na(HTDF2$TT.y),HTDF2$TT.x,HTDF2$TT.y),labels=c >>("HC","TER")) >> HTDF2<-HTDF2[,-(3:4)] >> >> >> # REPRODUCIBLE EXAMPLE DATA FOR ABOVE.. >>> dput(H_DF) >> structure(list(FY = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, >> 5L), .Label = c("FY09", "FY10", "FY11", "FY12", "FY13"), class = >>"factor"), >> ID = c(1, 1, 1, 1, 2, 2, 2, 2, 2), TT = structure(c(1L, 1L, >> 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HC", "TER"), class = >>"factor")), .Names = c("FY", >> "ID", "TT"), class = "data.frame", row.names = c(1L, 2L, 3L, >> 4L, 6L, 7L, 9L, 10L, 11L)) >>> dput(T_DF) >> structure(list(FY = structure(c(4L, 2L, 5L), .Label = c("FY09", >> "FY10", "FY11", "FY12", "FY13"), class = "factor"), ID = c(1, >> 2, 2), TT = structure(c(2L, 2L, 2L), .Label = c("HC", "TER"), class = >>"factor")), .Names = c("FY", >> "ID", "TT"), row.names = c(5L, 8L, 12L), class = "data.frame") >> > >Here is an sqldf solution: > >> library(sqldf) >> sqldf("select FY, ID, coalesce(t.TT, h.TT) TT from H_DF h left join >>T_DF t using(FY, ID)") > FY ID TT >1 FY09 1 HC >2 FY10 1 HC >3 FY11 1 HC >4 FY12 1 TER >5 FY09 2 HC >6 FY10 2 TER >7 FY11 2 HC >8 FY12 2 HC >9 FY13 2 TER > > >-- >Statistics & Software Consulting >GKX Group, GKX Associates Inc. >tel: 1-877-GKX-GROUP >email: ggrothendieck at gmail.com > >______________________________________________ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.