Re: [Rd] Should there be a confint.mlm ?

2018-07-21 Thread steven pav
I would welcome the fixes suggested by Martin. I did not think my
implementation was pretty, but didn't want to suggest a bugfix without
submitting code.

Regarding the idea of a `stderr` function, I am all for potential
efficiency gains, but I suspect that in many situations `vcov` is the
result of a matrix inversion. This might mean an 'efficient computation of
only the diagonal of the inverse of a matrix' function would be required to
achieve widespread use. (I am not aware of the algorithm for that, but I am
ignorant.)

Again, thanks for patching `confint.lm`.


On Fri, Jul 20, 2018 at 9:06 AM, Martin Maechler  wrote:

> > steven pav
> > on Thu, 19 Jul 2018 21:51:07 -0700 writes:
>
> > It seems that confint.default returns an empty data.frame
> > for objects of class mlm. For example:
>
> > It seems that confint.default returns an empty data.frame for objects of
> > class mlm.
>
> Not quite: Note that 'mlm' objects are also 'lm' objects, and so
> it is confint.lm() which is called here and fails.
>
> > For example:
> >
> > ```
> > nobs <- 20
> > set.seed(1234)
> > # some fake data
> > datf <-
> > data.frame(x1=rnorm(nobs),x2=runif(nobs),y1=rnorm(nobs),y2=rnorm(nobs))
> > fitm <- lm(cbind(y1,y2) ~ x1 + x2,data=datf)
> > confint(fitm)
> > # returns:
> >  2.5 % 97.5 %
> > ```
> >
> > I have seen proposed workarounds on stackoverflow and elsewhere, but
> > suspect this should be fixed in the stats package.
>
> I agree.
> It may be nicer to tweak  confint.lm() instead though.
>
> I'm looking into doing that.
>
> > A proposed implementation would be:
> >
> > ```
> > # I need this to run the code, but stats does not:
> > format.perc <- stats:::format.perc
>
> or better (mainly for esthetical reasons), use
>
>   environment(confint.mlm) <- asNamespace("stats")
>
> after defining  confint.mlm [below]
>
> > # compute confidence intervals for mlm object.
> > confint.mlm <- function (object, level = 0.95, ...) {
> >   cf <- coef(object)
> >   ncfs <- as.numeric(cf)
> >   a <- (1 - level)/2
> >   a <- c(a, 1 - a)
> >   fac <- qt(a, object$df.residual)
> >   pct <- format.perc(a, 3)
> >   ses <- sqrt(diag(vcov(object)))
>
> BTW --- and this is a diversion ---  This is nice mathematically
> (and used in other places, also in "base R" I think)
> but in principle is a waste:  Computing a full
> k x k matrix and then throwing away all but the length-k
> diagonal ...
> In the past I had contemplated but never RFC'ed or really
> implemented a stderr() generic with default method
>
>stderr.default <- function(object) sqrt(diag(vcov(object)))
>
> but allow non-default methods to be smarter and hence more efficient.
>
> >   ci <- ncfs + ses %o% fac
> >   setNames(data.frame(ci),pct)
> > }
> >
> > # returning to the example above,
> > confint(fitm)
> > # returns:
> >  2.5 % 97.5 %
> > y1:(Intercept) -1.2261 0.7037
> > y1:x1  -0.5100 0.2868
> > y1:x2  -2.7554 0.8736
> > y2:(Intercept) -0.6980 2.2182
> > y2:x1  -0.6162 0.5879
> > y2:x2  -3.9724 1.5114
> > ```
>
> I'm looking into a relatively small patch to confint.lm()
> *instead* of the confint.mlm() above
>
> Thank you very much, Steven, for your proposal!
> I will let you (and the R-devel audience) know the outcome.
>
> Best regards,
> Martin Maechler
> ETH Zurich  and  R Core Team
>
> > --
> >
> > --sep
> >
> >   [[alternative HTML version deleted]]
> >
>



-- 

--sep

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Library lib.loc Option Ignored for Dependencies

2018-07-21 Thread Dario Strbenac
Good day,

Now that I read about Renaud's problem, I realise that it is effectively the 
same as mine.

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Library lib.loc Option Ignored for Dependencies

2018-07-21 Thread Benjamin Tyner

Not sure whether it is the same issue as was raised here:

https://stat.ethz.ch/pipermail/r-devel/2010-October/058729.html

but in any case perhaps the problem could partially be remedied on line 
245 of src/library/base/R/library.R by passing the lib.loc to 
.getRequiredPackages2() ...here is a patch (untested)


Index: src/library/base/R/library.R
===
--- src/library/base/R/library.R    (revision 74997)
+++ src/library/base/R/library.R    (working copy)
@@ -242,7 +242,7 @@
 pos <- 2
 } else pos <- npos
 }
-    .getRequiredPackages2(pkgInfo, quietly = quietly)
+    .getRequiredPackages2(pkgInfo, lib.loc = lib.loc, quietly = 
quietly)

 deps <- unique(names(pkgInfo$Depends))

 ## If the namespace mechanism is available and the package


On 07/21/2018 12:34 PM, Martin Maechler wrote:

Benjamin Tyner
 on Fri, 20 Jul 2018 19:42:09 -0400 writes:

 > Here's a trick/workaround; if lib.loc is the path to your
 > library, then prior to calling library(),

 >> environment(.libPaths)$.lib.loc <- lib.loc

Well, that is quite a "trick"  -- and potentially a pretty
dangerous one, not intended when making .libPaths a closure 


I do think that there is a problem with R's dealing of R_LIBS
and other libPaths settings, notably when checking packages and
within that recompiling vignettes etc, where the R process
starts new versions of R via system() / system2() and then gets
to wrong .libPaths() settings,
and I personally would be very happy if we got reprex'es with
small fake packages -- possibly only easily reproducible on
unix-alikes ... so we could address this as a bug (or more than
one) to be fixed.

Notably with the 3.4.x --> 3.5.0 transition and my/our tendency
of having quite a few paths in R_LIBS / lib.loc / ... I've been
bitten by problems when the wrong version of package was taken
from the wrong library path 

Martin


 >> 
 >> Good day,
 >>
 >> If there's a library folder of the latest R packages and
 >> a particular package from it is loaded using the lib.loc
 >> option, the dependencies of that package are still
 >> attempted to be loaded from another folder of older
 >> packages specified by R_LIBS, which may cause errors
 >> about version requirements not being met. The
 >> documentation of the library function doesn't explain
 >> what the intended result is in such a case, but it could
 >> reasonably be expected that R would also load the
 >> dependencies from the user-specified lib.loc folder.
 >>
 >> --
 >> Dario Strbenac University of Sydney Camperdown NSW 2050
 >> Australia


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Model formulas with explicit references

2018-07-21 Thread Berry, Charles



> On Jul 20, 2018, at 3:05 PM, Lenth, Russell V  wrote:
> 
> Dear R-Devel,
> 
> I seem to no longer be able to access the bug-reporting system, so am doing 
> this by e-mail.
> 
> My report concerns models where variables are explicitly referenced (or is it 
> "dereferenced"?), such as:
> 
>cars.lm <- lm(mtcars[[1]] ~ factor(mtcars$cyl) + mtcars[["disp"]])
> 
> I have found that it is not possible to predict such models with new data. 
> For example:
> 
>> predict(cars.lm, newdata = mtcars[1:5, )
>   12345678
> 9   10 
> 20.37954 20.37954 26.58543 17.70329 14.91157 18.60448 14.91157 25.52859 
> 25.68971 20.17199 
>  11   12   13   14   15   16   17   18   
> 19   20 
> 20.17199 17.21096 17.21096 17.21096 11.85300 12.18071 12.72688 27.38558 
> 27.46750 27.59312 
>  21   22   23   24   25   26   27   28   
> 29   30 
> 26.25500 16.05853 16.44085 15.18466 13.81922 27.37738 26.24954 26.93772 
> 15.15735 20.78917 
>  31   32 
> 16.52278 26.23042 
> Warning message:
> 'newdata' had 5 rows but variables found have 32 rows 
> 
> Instead of returning 5 predictions, it returns the 32 original predicted 
> values. There is a warning message suggesting that something went wrong. This 
> tickled my curiosity, and hance this result:
> 
>> predict(cars.lm, newdata = data.frame(x = 1:32))
>   12345678
> 9   10 
> 20.37954 20.37954 26.58543 17.70329 14.91157 18.60448 14.91157 25.52859 
> 25.68971 20.17199 
>  11   12   13   14   15   16   17   18   
> 19   20 
> 20.17199 17.21096 17.21096 17.21096 11.85300 12.18071 12.72688 27.38558 
> 27.46750 27.59312 
>  21   22   23   24   25   26   27   28   
> 29   30 
> 26.25500 16.05853 16.44085 15.18466 13.81922 27.37738 26.24954 26.93772 
> 15.15735 20.78917 
>  31   32 
> 16.52278 26.23042
> 
> Again, the new data are ignored, but there is no warning message, because the 
> previous warning was based only on a discrepancy with the number of rows and 
> the number of predictions. Indeed, the new data set makes no sense at all in 
> the context of this model.
> 
> At the root of this behavior is the fact that the model.frame function 
> ignores its data argument with such models. So instead of constructing a new 
> frame based on the new data, it just returns the original model frame.
> 

This produces what I think you intended:

> predict(cars.lm, newdata = list(mtcars=mtcars[1:5,]) )
   12345 
20.37954 20.37954 26.58543 17.70329 14.91157 
>


> I am not really suggesting that you try to make these things work with models 
> when the formula is like this. Instead, I am hoping that it throws an actual 
> error message rather than just a warning, and that you be a little bit more 
> sophisticated than merely checking the number of rows. Both predict() with 
> newdata provided, and model.frame() with a data argument, should return an 
> informative error message that says that model formulas like this are not 
> supported with new data.

As you can see they are supported, but you have to make sure that the objects 
in the formula can be found in the newdata arg. If this is puzzling, try 

debugonce(predict.lm)
predict(cars.lm, newdata = mtcars[1:5, )

and inspect the newdata object and terms(object). You should see why the terms 
in the formula are not found in newdata.

If you  think that something like your idiom for formula is required, maybe you 
should  repost on r-help and say what you are trying to do with it.  I expect 
you'll get some advice on how to reformulate your call.

HTH,

Chuck
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Library lib.loc Option Ignored for Dependencies

2018-07-21 Thread Martin Maechler
> Benjamin Tyner 
> on Fri, 20 Jul 2018 19:42:09 -0400 writes:

> Here's a trick/workaround; if lib.loc is the path to your
> library, then prior to calling library(),

>> environment(.libPaths)$.lib.loc <- lib.loc

Well, that is quite a "trick"  -- and potentially a pretty
dangerous one, not intended when making .libPaths a closure 


I do think that there is a problem with R's dealing of R_LIBS
and other libPaths settings, notably when checking packages and
within that recompiling vignettes etc, where the R process
starts new versions of R via system() / system2() and then gets
to wrong .libPaths() settings,
and I personally would be very happy if we got reprex'es with
small fake packages -- possibly only easily reproducible on
unix-alikes ... so we could address this as a bug (or more than
one) to be fixed.

Notably with the 3.4.x --> 3.5.0 transition and my/our tendency
of having quite a few paths in R_LIBS / lib.loc / ... I've been
bitten by problems when the wrong version of package was taken
from the wrong library path 

Martin


>> 
>> Good day,
>> 
>> If there's a library folder of the latest R packages and
>> a particular package from it is loaded using the lib.loc
>> option, the dependencies of that package are still
>> attempted to be loaded from another folder of older
>> packages specified by R_LIBS, which may cause errors
>> about version requirements not being met. The
>> documentation of the library function doesn't explain
>> what the intended result is in such a case, but it could
>> reasonably be expected that R would also load the
>> dependencies from the user-specified lib.loc folder.
>> 
>> --
>> Dario Strbenac University of Sydney Camperdown NSW 2050
>> Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] base::mean not consistent about NA/NaN

2018-07-21 Thread Jan Gorecki
Thank you Tomas for detailed explanation. Such a nice description deserves
to be included somewhere in documentation, R-internals maybe.
Regards
Jan

On 18 Jul 2018 18:24, "Tomas Kalibera"  wrote:

Yes, the performance overhead of fixing this at R level would be too
large and it would complicate the code significantly. The result of
binary operations involving NA and NaN is hardware dependent (the
propagation of NaN payload) - on some hardware, it actually works the
way we would like - NA is returned - but on some hardware you get NaN or
sometimes NA and sometimes NaN. Also there are C compiler optimizations
re-ordering code, as mentioned in ?NaN. Then there are also external
numerical libraries that do not distinguish NA from NaN (NA is an R
concept). So I am afraid this is unfixable. The disclaimer mentioned by
Duncan is in ?NaN/?NA, which I think is ok - there are so many numerical
functions through which one might run into these problems that it would
be infeasible to document them all. Some functions in fact will preserve
NA, and we would not let NA turn into NaN unnecessarily, but the
disclaimer says it is something not to depend on.


Tomas


On 07/03/2018 11:12 AM, Jan Gorecki wrote:
> Thank you for interesting examples.
> I would find useful to document this behavior also in `?mean`, while `+`
> operator is also affected, the `sum` function is not.
> For mean, NA / NaN could be handled in loop in summary.c. I assume that
> performance penalty of fix is the reason why this inconsistency still
> exists.
> Jan
>
> On Mon, Jul 2, 2018 at 8:28 PM, Barry Rowlingson <
> b.rowling...@lancaster.ac.uk> wrote:
>
>> And for a starker example of this (documented) inconsistency,
>> arithmetic addition is not commutative:
>>
>>   > NA + NaN
>>   [1] NA
>>   > NaN + NA
>>   [1] NaN
>>
>>
>>
>> On Mon, Jul 2, 2018 at 5:32 PM, Duncan Murdoch 
>> wrote:
>>> On 02/07/2018 11:25 AM, Jan Gorecki wrote:
 Hi,
 base::mean is not consistent in terms of handling NA/NaN.
 Mean should not depend on order of its arguments while currently it is.
>>> The result of mean() can depend on the order even with regular numbers.
>>> For example,
>>>
>>>   > x <- rep(c(1, 10^(-15)), 100)
>>>   > mean(sort(x)) - 0.5
>>> [1] 5.551115e-16
>>>   > mean(rev(sort(x))) - 0.5
>>> [1] 0
>>>
>>>
   mean(c(NA, NaN))
   #[1] NA
   mean(c(NaN, NA))
   #[1] NaN

 I created issue so in case of no replies here status of it can be
>> looked up
 at:
 https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>>> The help page for ?NaN says,
>>>
>>> "Computations involving NaN will return NaN or perhaps NA: which of
>>> those two is not guaranteed and may depend on the R platform (since
>>> compilers may re-order computations)."
>>>
>>> And ?NA says,
>>>
>>> "Numerical computations using NA will normally result in NA: a possible
>>> exception is where NaN is also involved, in which case either might
>>> result (which may depend on the R platform). "
>>>
>>> So I doubt if this inconsistency will be fixed.
>>>
>>> Duncan Murdoch
>>>
 Best,
 Jan

[[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel