On Tue, 2003-06-24 at 18:07, Douglas Bates wrote: > In our recent workshop on "Multilevel Modeling in R" we discussed > handling data for multilevel modeling. An classic example of such > data are test scores of students grouped into schools. We may wish to > model the scores as functions of both student-level covariates and > school-level covariates. > > Such data are often organized in a multi-table format with a separate table > for each level of information. The MathAchieve and MathAchSchool data > frames in the nlme package are examples of such an organization. The > HLM software requires the data to be organized like this. To fit a > model in R we need to create a composite table by "joining" the > columns of the student-level and school-level tables, in the > relational database sense of "join". > > I have created a function to join the columns from two such frames > according to the values of a key column. In relational database terms > the key column must be a primary key for the second frame. I have > called this function 'cjoin', by analogy to cbind. > > You can try > > data(MathAchieve, package = 'nlme') > data(MathAchSchool, package = 'nlme') > cjoin(MathAchieve, MathAchSchool, "School") > cjoin(MathAchieve, MathAchSchool, "School", which = "Sector") > > as examples > > Several questions: > > - Am I duplicating existing functionality? > > - Is cjoin a good name for such a function? > > - Would this be useful in base?
Prof. Bates, Perhaps I am not seeing all of the details but a quick (perhaps too quick) review would suggest that merge(), which is in base, would at least be a parallel function. For example: data(MathAchieve, package = 'nlme') data(MathAchSchool, package = 'nlme') cj1 <- cjoin(MathAchieve, MathAchSchool, "School") mrg1 <- merge(MathAchieve, MathAchSchool, by = "School") dim(cj1) [1] 7185 12 dim(mrg1) [1] 7185 12 colnames(cj1) [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES" [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES" colnames(mrg1) [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" [7] "Size" "Sector" "PRACAD" "DISCLIM" "HIMINTY" "MEANSES.y" > table(cj1 == mrg1) TRUE 86220 Note that cj1 appears to have duplicate colnames for "MEANSES", for which your function does issue a warning message, whereas mrg1 has appended ".x" and ".y" as default suffixes. In the second case, if I am reading the code correctly, the 'which' argument appears to restrict the columns retained from the second data frame after the key match, in this case "Sector". > cj2 <- cjoin(MathAchieve, MathAchSchool, "School", which = "Sector") > mrg2 <- mrg1[, c(1:6, 8)] > dim(cj2) [1] 7185 7 > dim(mrg2) [1] 7185 7 > colnames(cj2) [1] "School" "Minority" "Sex" "SES" [5] "MathAch" "MEANSES" "fr2[mm, which]" > colnames(mrg2) [1] "School" "Minority" "Sex" "SES" "MathAch" "MEANSES.x" [7] "Sector" > table(cj2 == mrg2) TRUE 50295 See ?merge for more information. Thoughts? Best regards, Marc Schwartz ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel