> On Jul 14, 2017, at 9:50 AM, Martin Maechler <maech...@stat.math.ethz.ch> > wrote: > >>>>>> Martin Maechler <maech...@stat.math.ethz.ch> >>>>>> on Fri, 14 Jul 2017 16:30:50 +0200 writes: > >>>>>> Marc Schwartz <marc_schwa...@me.com> >>>>>> on Fri, 14 Jul 2017 06:57:26 -0500 writes: > >>>> On Jul 13, 2017, at 5:07 PM, Marc Schwartz <marc_schwa...@me.com> wrote: >>>> >>>> >>>>> On Jul 13, 2017, at 3:37 PM, Marc Schwartz <marc_schwa...@me.com> wrote: >>>>> >>>>> >>>>>> On Jul 13, 2017, at 3:22 PM, Duncan Murdoch <murdoch.dun...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> On 13/07/2017 4:08 PM, Marc Schwartz wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> As per the discussion today on R-Help: >>>>>>> >>>>>>> https://stat.ethz.ch/pipermail/r-help/2017-July/448132.html >>>>>>> >>>>>>> I am attaching a proposed patch for poly.Rd to provide clarifying >>>>>>> wording relative to naming the 'degree' argument explicitly, in the >>>>>>> case where the 'x' argument is a matrix, rather than a vector. >>>>>>> >>>>>>> This is based upon the svn trunk version of poly.Rd. >>>>>> >>>>>> I don't think this is the right fix. The use of the unnamed 2nd arg as >>>>>> degree happens whether the first arg is a matrix or not. >>>>>> >>>>>> I didn't read the whole thread in detail, but it appears there's a bug >>>>>> somewhere, in the report or in the poly() code or in the plsr() code. >>>>>> That bug should be reported on the bug list if it turns out to be in >>>>>> base R, and to the package maintainer if it is in plsr(). >>>>>> >>>>>> Duncan Murdoch >>>>> >>>>> >>>>> Duncan, >>>>> >>>>> Thanks for your reply. You only really need to read that last post in the >>>>> thread linked to above. >>>>> >>>>> I won't deny the possibility of a bug in poly(), relative to the handling >>>>> of 'x' as a matrix. The behavior occurs in the poly() function in a pure >>>>> stand alone fashion, without the need for plsr(): >>>>> >>>>> x1 <- runif(20) >>>>> x2 <- runif(20) >>>>> mx <- cbind(x1, x2) >>>>> >>>> >>>> <snip> >>>> >>>> Duncan, >>>> >>>> Tracing through the code for poly() using debug once with: >>>> >>>> poly(mx, 2) >>>> >>>> and then with: >>>> >>>> poly(mx, degree = 2) >>>> >>>> there is a difference in the transformation of 'mx' internally by the use >>>> of: >>>> >>>> if (is.matrix(x)) { >>>> m <- unclass(as.data.frame(cbind(x, ...))) >>>> return(do.call(polym, c(m, degree = degree, raw = raw, list(coefs = >>>> coefs)))) >>>> } >>>> >>>> >>>> In the first case, 'mx' ends up being transformed to: >>>> >>>> Browse[2]> m >>>> $x1 >>>> [1] 0.99056941 0.13953093 0.38965567 0.35353514 0.90838486 0.97552474 >>>> [7] 0.01135743 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973 >>>> [13] 0.31333549 0.97488211 0.54952630 0.71747157 0.31164777 0.81694822 >>>> [19] 0.58641410 0.08858699 >>>> >>>> $x2 >>>> [1] 0.6628658 0.9221436 0.3162418 0.8494452 0.4665010 0.3403719 >>>> [7] 0.4040692 0.4916650 0.9091161 0.2956006 0.3454689 0.3331070 >>>> [13] 0.8788974 0.5614636 0.7794396 0.2304009 0.6566537 0.6875646 >>>> [19] 0.5110733 0.4122336 >>>> >>>> $V3 >>>> [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 >>>> >>>> attr(,"row.names") >>>> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 >>>> >>>> Thus, when do.call() is used, m$V3 is passed as the 'x' argument on the >>>> third iteration, essentially resulting in: >>>> >>>>> polym(rep(2, 20), degree = 2) >>>> Error in poly(dots[[1L]], degree, raw = raw, simple = raw && nd > 1) : >>>> 'degree' must be less than number of unique points >>>> >>>> >>>> Note also that in this case, 'dots', which is the result of using >>>> list(...) on the initial call, is: >>>> >>>> Browse[2]> dots >>>> [[1]] >>>> [1] 2 >>>> >>>> >>>> In the second case: >>>> >>>> Browse[2]> m >>>> $x1 >>>> [1] 0.99056941 0.13953093 0.38965567 0.35353514 0.90838486 0.97552474 >>>> [7] 0.01135743 0.06537047 0.56207834 0.50554056 0.96653391 0.69533973 >>>> [13] 0.31333549 0.97488211 0.54952630 0.71747157 0.31164777 0.81694822 >>>> [19] 0.58641410 0.08858699 >>>> >>>> $x2 >>>> [1] 0.6628658 0.9221436 0.3162418 0.8494452 0.4665010 0.3403719 >>>> [7] 0.4040692 0.4916650 0.9091161 0.2956006 0.3454689 0.3331070 >>>> [13] 0.8788974 0.5614636 0.7794396 0.2304009 0.6566537 0.6875646 >>>> [19] 0.5110733 0.4122336 >>>> >>>> attr(,"row.names") >>>> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 >>>> >>>> >>>> So, there is no m$V3. >>>> >>>> Note also that 'dots' ends up being: >>>> >>>> Browse[2]> dots >>>> list() >>>> >>>> >>>> In both cases, 'degree' is indeed 2, but the result of 'list(...)' on the >>>> initial function call is quite different. >>>> >>>> So, I may be hypo-caffeinated, but if there is a bug here, it may be due >>>> to the way in which cbind() is being called in the code above, where the >>>> three dots are being used? >>>> >>>> I can replicate the presumably correct behavior by using: >>>> >>>> m <- unclass(as.data.frame(cbind(x))) >>>> >>>> instead of: >>>> >>>> m <- unclass(as.data.frame(cbind(x, ...))) >>>> >>>> But I am not sure if removing the three dots in the cbind() call may have >>>> other unintended consequences. >>>> >>>> Regards, >>>> >>>> Marc > > >>> Duncan, > >>> Some additional information here. >>> Reviewing the source code for the function in SVN: > >>> https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R > >>> there is a relevant comment in the code: > >>> if(is.matrix(x)) { ## FIXME: fails when combined with 'unnamed degree' above >>> m <- unclass(as.data.frame(cbind(x, ...))) >>> return(do.call(polym, c(m, degree = degree, raw = raw, >>> list(coefs=coefs)))) >>> } > > >>> A version review would suggest that the above comment was added to the code >>> back in 2015. > >> Yes, by me, possibly here : > >> $ svn log -v -c68727 >> ------------------------------------------------------------------------ >> r68727 | maechler | 2015-07-23 16:14:59 +0200 (Thu, 23 Jul 2015) | 1 line >> Changed paths: >> M /trunk/doc/NEWS.Rd >> M /trunk/src/library/stats/R/contr.poly.R >> M /trunk/src/library/stats/man/poly.Rd >> M /trunk/tests/Examples/stats-Ex.Rout.save >> M /trunk/tests/reg-tests-1c.R > >> poly(), polym() now work better notably for prediction >> ------------------------------------------------------------------------ >> $ svn-diffB -c68727 doc/NEWS.Rd >> Index: doc/NEWS.Rd >> =================================================================== >> 126a127,133 >>> >>> \item \code{polym()} gains a \code{coefs = NULL} argument and >>> returns class \code{"poly"} just like \code{poly()} which gets a >>> new \code{simple=FALSE} option. They now lead to correct >>> \code{predict()}ions, e.g., on subsets of the original data. >>> %% see https://stat.ethz.ch/pipermail/r-devel/2015-July/071532.html > > >>> So it would appear that the behavior being discussed here is known. > >> Indeed! I remember to have spent quite a few hours with the >> code and its different uses before committing that patch. > >>> I am still confused by the need for the '...' in the call to cbind(), which >>> as far as I can tell, has been in the code at least back to 2003, when the >>> poly() code was split from base. > >>> I am not sure why one would want to pass on other '...' arguments to >>> cbind(), but I am presumably missing something here. > >> Yes, I think passing the '...' is important there... >> OTOH, I'm almost sure that I wrote the 'FIXME' because I thought >> one should be able to do things better. >> So, I'm happy to e-talk to you about how to get rid of the FIXME >> and still remain back-compatible: Notably with the paragraph in ?poly >> |> Details: >> |> >> |> Although formally ‘degree’ should be named (as it follows ‘...’), >> |> an unnamed second argument of length 1 will be interpreted as the >> |> degree, such that ‘poly(x, 3)’ can be used in formulas. > > As a matter of fact, a patch seems very simple, and I am > testing it now. > > Won't have much more time today, but will return "on this channel" > later, maybe tomorrow. > > Martin
Martin, Thanks for taking the time to look at this! Marc ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel