>>>>> Ben Bolker >>>>> on Mon, 27 Dec 2021 09:43:42 -0500 writes:
> I agree that it seems non-intuitive (I can't think of a > design reason for it to look this way), but I'd like to > stress that it's *not* an information leak; the > predictions of the model are independent of the > parameterization, which is all this issue affects. In a > worst case there might be some unfortunate effects on > numerical stability if the data-dependent bases are > computed on a very different set of data than the model > fitting actually uses. > I've attached a suggested documentation patch (I hope > it makes it through to the list, if not I can add it to > the body of a message.) It did make it through; thank you, Ben! ( After adding two forgotten '}' ) I've committed the help file additions to the R sources (R-devel) in svn r81434 . Thanks again and "Happy New Year" to all readers, Martin > On 12/26/21 8:35 PM, Balise, Raymond R wrote: >> Hello R folks, Today I noticed that using the subset >> argument in lm() with a polynomial gives a different >> result than using the polynomial when the data has >> already been subsetted. This was not at all intuitive for >> me. You can see an example here: >> https://stackoverflow.com/questions/70490599/why-does-lm-with-the-subset-argument-give-a-different-answer-than-subsetting-i >> >> If this is a design feature that you don’t think should >> be fixed, can you please include it in the documentation >> and explain why it makes sense to figure out the >> orthogonal polynomials on the entire dataset? This feels >> like a serous leak of information when evaluating train >> and test datasets in a statistical learning framework. >> >> Ray >> >> Raymond R. Balise, PhD Assistant Professor Department of >> Public Health Sciences, Biostatistics >> >> University of Miami, Miller School of Medicine 1120 >> N.W. 14th Street Don Soffer Clinical Research Center - >> Room 1061 Miami, Florida 33136 >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > -- > Dr. Benjamin Bolker Professor, Mathematics & Statistics > and Biology, McMaster University Director, School of > Computational Science and Engineering Graduate chair, > Mathematics & Statistics x[DELETED ATTACHMENT external: > BenB_lm-subset.patch, plain text] > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel