on 03/24/2007 11:13 PM Marc Schwartz said the following: > On Sat, 2007-03-24 at 22:16 -0400, Daniel Folkinshteyn wrote: >> on 03/24/2007 10:00 PM Marc Schwartz said the following: >>> On Sat, 2007-03-24 at 21:47 -0400, Daniel Folkinshteyn wrote: >>>> Greetings to all. >>>> I need to concatenate data frames that do not have all the same variable >>>> names, there is only a partial overlap in the variables. So, for >>>> example, if i have two data frames, a and b, that look like the following: >>>>> a >>>> a b >>>> 1 1 4 >>>> 2 2 5 >>>> 3 3 6 >>>> 4 4 7 >>>> 5 5 8 >>>>> b >>>> c a >>>> 1 1 10 >>>> 2 2 11 >>>> 3 3 12 >>>> 4 4 13 >>>> 5 5 14 >>>> >>>> i want to concatenate them by row, without any matching, so that the >>>> variables that are not available in all frames get NAs. The result >>>> should look like: >>>> >>>> a b c >>>> 1 1 4 NA >>>> 2 2 5 NA >>>> 3 3 6 NA >>>> 4 4 7 NA >>>> 5 5 8 NA >>>> 6 10 NA 1 >>>> 7 11 NA 2 >>>> 8 12 NA 3 >>>> 9 13 NA 4 >>>> 10 14 NA 5 >>>> >>>> rbind doesn't work, since it requires all variables to be matched >>>> between the two data frames. merge doesn't work, since it wants to >>>> /match/ by columns with the same name, and if matching by nothing, >>>> produces a cartesian product. >>>> >>>> is there a neat trick for doing this simply, or am i stuck with >>>> comparing variable lists and generating NAs manually? >>>> >>>> would appreciate any help! >>>> Daniel >>> You can use merge(): >>> >>>> a >>> a b >>> 1 1 4 >>> 2 2 5 >>> 3 3 6 >>> 4 4 7 >>> 5 5 8 >>> >>>> b >>> c a >>> 1 1 10 >>> 2 2 11 >>> 3 3 12 >>> 4 4 13 >>> 5 5 14 >>> >>> >>> Use 'a' as the common 'by' column and specify 'all = TRUE' so that >>> non-matching values of 'a' will be included in the result: >>> >>> >>>> merge(a, b, by = "a", all = TRUE) >>> a b c >>> 1 1 4 NA >>> 2 2 5 NA >>> 3 3 6 NA >>> 4 4 7 NA >>> 5 5 8 NA >>> 6 10 NA 1 >>> 7 11 NA 2 >>> 8 12 NA 3 >>> 9 13 NA 4 >>> 10 14 NA 5 >>> >> Thanks for your quick response. Unfortunately, this is still not quite >> what I have in mind (though maybe it's my fault for not making this too >> clear). Even if the two data frames happen to have some values of 'a' >> that match, I still want those records to remain separate, rather than >> merge. So, for instance, using merge will produce the following: >>> a = data.frame(a=1:5, b=4:8) >>> a >> a b >> 1 1 4 >> 2 2 5 >> 3 3 6 >> 4 4 7 >> 5 5 8 >>> b = data.frame(c=1:5, a=4:8) >>> b >> c a >> 1 1 4 >> 2 2 5 >> 3 3 6 >> 4 4 7 >> 5 5 8 >>> merge(a,b,by='a',all=T) >> a b c >> 1 1 4 NA >> 2 2 5 NA >> 3 3 6 NA >> 4 4 7 1 >> 5 5 8 2 >> 6 6 NA 3 >> 7 7 NA 4 >> 8 8 NA 5 >> >> whereas I would still want it to produce 10 separate rows, because they >> are separate observations, it's just that one of them happens to be >> missing a variable. > > OK. Not sure if this is the most efficient way of doing this, but this > seems to work, though through very limited testing. > > Basically what I am doing is using setdiff() to figure out which columns > are not common between the two data frames. In each case, I then use > sapply() to loop over the results, creating a new column of NA's that > will be cbind()ed back to the original data frame. > > Once that is done, the two new data frames, a.2 and b.2, will have > common columns and they can then be rbind()ed. > > > a.2 <- cbind(a, sapply(setdiff(colnames(b), colnames(a)), > function(x) x = rep(NA, nrow(a)))) > > b.2 <- cbind(b, sapply(setdiff(colnames(a), colnames(b)), > function(x) x = rep(NA, nrow(b)))) > >> a.2 > a b c > 1 1 4 NA > 2 2 5 NA > 3 3 6 NA > 4 4 7 NA > 5 5 8 NA > >> b.2 > c a b > 1 1 4 NA > 2 2 5 NA > 3 3 6 NA > 4 4 7 NA > 5 5 8 NA > > >> rbind(a.2, b.2) > a b c > 1 1 4 NA > 2 2 5 NA > 3 3 6 NA > 4 4 7 NA > 5 5 8 NA > 6 4 NA 1 > 7 5 NA 2 > 8 6 NA 3 > 9 7 NA 4 > 10 8 NA 5 > > > Hopefully that will work in more general cases, but I would validate > that.
Thanks Marc, that seems like a pretty elegant solution, and I have learned some useful stuff from it. However, I will go with rbind.fill(reshape) that was recommended by Hadley (thanks!), since it's just so darn easy. :) Thank you all, Daniel
signature.asc
Description: OpenPGP digital signature
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
