Dea-R community. I'd like to draw your attention to an issue I have recently encountered while doing my current data analysis.
I've got an unexpected (to me) result from the command: > augPred(lmList(my.object)), 'my.object' being a grouped data frame of class: > class(my.object) [1] "nfnGroupedData" "nfGroupedData" "groupedData" "data.frame" The problem is more easily explained by showing these two to Trellis plot: > plot(my.object,some_options) > plot(augPred(lmList(my.object)),some_options) [link to output: http://www.palug.net/Members/jabba ] Clearly the problem is that the predictions are correct, but the responses in the ``original'' part of the augPred data frame are in the wrong order, i.e. the response order does not match the grouping factor order (not sure this is in correct English, sorry). Debugging the function 'augPred.lmList' I've managed to understand that the problem resides in the following function calls: > debug(nlme:::augPred.lmList) > fm1.lis <- lmList(my.object) > augPred(fm1.lis) debugging in: augPred.lmList(fm1.lis) debug: { ..... primary <- getCovariate(data) ..... groups <- getGroups(object) ..... orig <- data.frame(primary,groups, getResponse(object)) ..... } This approach works well on other datasets, such us the Orthodont data, but not for mine because, i guess, my rows are not first ordered by subject (subject is my grouping factor). In fact: > getGroups(fm1.lis)[1:5] [1] CBT.1 CBT.2 CBT.3 CBT.4 CBT.5 34 Levels: CBT.14 < CBT.2 < CBT.11 < CBT.13 < CBT.12 < CBT.5 < ... < V.6 > getResponse(fm1.lis)[1:5] CBT.1 CBT.1 CBT.1 CBT.1 CBT.2 34 32 29 27 28 My question: is there a simple way to get the proper result (modify the dataset or modify the function)? RTFM responses are welcomed. My (humble?) opinion: Neither Pinheiro-Bates (2000) nor online documentation, report the need for a particular row order in the costruction of grouped data objects, and i think it should be better do not rely on it. Or, at least, to document it somewhere. I want to thank the R-core team and the entire R community for this amazingly efficient and versatile software, and i am very proud that the (not only) statistical state-of-the-art software have a ``gpl distribution'' (which have many desirable properties ;-) ). Cheers. +---------------------------+ |Relevant system information| +---------------------------+ > R.version _ platform i486-pc-linux-gnu arch i486 os linux-gnu system i486, linux-gnu status major 2 minor 7.2 year 2008 month 08 day 25 svn rev 46428 language R version.string R version 2.7.2 (2008-08-25) > sessionInfo() R version 2.7.2 (2008-08-25) i486-pc-linux-gnu locale: LC_CTYPE=it_IT;LC_NUMERIC=C;LC_TIME=it_IT;LC_COLLATE=it_IT;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=it_IT;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] MASS_7.2-44 lattice_0.17-15 nlme_3.1-89 loaded via a namespace (and not attached): [1] grid_2.7.2 tools_2.7.2 -- Marco Barbàra - Undergraduate student in Statistics at the University of Palermo (Italy) - Only free-as-in-freedom software user - Emacs lover ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.