G'day Nick, On Wed, 19 Jan 2011 09:43:56 +0100 "Nick Sabbe" <nick.sa...@ugent.be> wrote:
> Given a dataframe > > dfr<-data.frame(c1=c("a", "b", NA, "a", "a"), c2=c("d", NA, "d", "e", > "e"), c3=c("g", "h", "i", "j", "k")) > > I would like to have a dataframe with all (unique) combinations of > all the factors present. Easy: R> expand.grid(lapply(dfr, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j 17 a d k 18 b d k 19 a e k 20 b e k > In fact, I would like a simple solution for these two cases: given > the three factor columns above, I would like both all _possible_ > combinations of the factor levels, and all _present_ combinations of > the factor levels (e.g. if I would do this for the first 4 rows of > dfr, it would contain no combinations with c3="k"). R> dfrpart <- lapply(dfr[1:4,], factor) R> expand.grid(lapply(dfrpart, levels)) c1 c2 c3 1 a d g 2 b d g 3 a e g 4 b e g 5 a d h 6 b d h 7 a e h 8 b e h 9 a d i 10 b d i 11 a e i 12 b e i 13 a d j 14 b d j 15 a e j 16 b e j > It would also be nice to be able to choose whether or not NA's are > included. R> expand.grid(lapply(dfrpart, function(x) c(levels(x), + if(any(is.na(x))) NA else NULL))) c1 c2 c3 1 a d g 2 b d g 3 <NA> d g 4 a e g 5 b e g 6 <NA> e g 7 a <NA> g 8 b <NA> g 9 <NA> <NA> g 10 a d h 11 b d h .... HTH. Cheers, Berwin ========================== Full address ============================ Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019) +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: ber...@maths.uwa.edu.au Australia http://www.maths.uwa.edu.au/~berwin ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.