Re: [R] Fwd: Strange results : bootrstrp CIs

2024-01-14 Thread varin sacha via R-help
Dear R-experts,

I really thank you all for your responses.

Best,



Le dimanche 14 janvier 2024 à 10:22:12 UTC+1, Duncan Murdoch 
 a écrit : 





On 13/01/2024 8:58 p.m., Rolf Turner wrote:
> On Sat, 13 Jan 2024 17:59:16 -0500
> Duncan Murdoch  wrote:
> 
> 
> 
>> My guess is that one of the bootstrap samples had a different
>> selection of countries, so factor(Country) had different levels, and
>> that would really mess things up.
>>
>> You'll need to decide how to handle that:  If you are trying to
>> estimate the coefficient for Italy in a sample that contains no data
>> from Italy, what should the coefficient be?
> 
> Perhaps NA?  Ben Bolker conjectured that boot() might be able to handle
> this.  Getting the NAs into the coefficients is a bit of a fag, but.  I
> tried:

My question was really intended as a statistical question.  From a 
statistical perspective, if I have a sampling scheme that sometimes 
generates sample size 0, should my CI be (-Inf, Inf) for high enough 
confidence level?

A Bayesian might say that inference should be entirely based on the 
prior in the case of no relevant data.  You could get similar numerical 
results by adding some fake data to every bootstrap sample, e.g. a 
single weighted observation for each country at your prior mean for that 
country, with weight chosen to match the strength of the prior.  But 
Bayesian methods don't give confidence intervals, they give credible 
intervals, and those aren't the same thing even if they are sometimes 
numerically similar.


Duncan Murdoch


> func <- function(data, idx) {
> clyde <- coef(lm(Score~ Time + factor(Country),data=data))
> ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> urk <- rep(NA,length(clyde))
> names(urk) <-names(clyde)
> urk[names(ccc)] <- ccc
> urk
> }
> 
> It produced a result:
> 
>>> set.seed(42)
>>> B= boot(e, func, R=1000)
>> B
>>
>> ORDINARY NONPARAMETRIC BOOTSTRAP
>>
>>
>> Call:
>> boot(data = e, statistic = func, R = 1000)
>>
>>
>> Bootstrap Statistics :
>>        original    bias    std. error
>> t1*  609.62500  3.6204052    95.39452
>> t2*  -54.81250 -1.6624704    36.32911
>> t3*  -41.3 -2.7337992  100.72113
>> t4*  -96.0 -1.0995718    99.78864
>> t5* -126.0 -0.6548886    63.47076
>> t6*  -26.3 -1.6516683    87.80483
>> t7*  -15.7 -0.8391170    91.72467
>> t8*  -21.7 -5.4544013    83.69211
>> t9*  18.3 -0.7711001    85.57278
> 
> However I have no idea if the result is correct, or even meaningful. I
> have no idea what I'm doing.  Just hammering and hoping. ️
> 
> 
> 
> cheers,
> 
> Rolf
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Strange results : bootrstrp CIs

2024-01-14 Thread Duncan Murdoch

On 13/01/2024 8:58 p.m., Rolf Turner wrote:

On Sat, 13 Jan 2024 17:59:16 -0500
Duncan Murdoch  wrote:




My guess is that one of the bootstrap samples had a different
selection of countries, so factor(Country) had different levels, and
that would really mess things up.

You'll need to decide how to handle that:  If you are trying to
estimate the coefficient for Italy in a sample that contains no data
from Italy, what should the coefficient be?


Perhaps NA?  Ben Bolker conjectured that boot() might be able to handle
this.  Getting the NAs into the coefficients is a bit of a fag, but.  I
tried:


My question was really intended as a statistical question.  From a 
statistical perspective, if I have a sampling scheme that sometimes 
generates sample size 0, should my CI be (-Inf, Inf) for high enough 
confidence level?


A Bayesian might say that inference should be entirely based on the 
prior in the case of no relevant data.  You could get similar numerical 
results by adding some fake data to every bootstrap sample, e.g. a 
single weighted observation for each country at your prior mean for that 
country, with weight chosen to match the strength of the prior.  But 
Bayesian methods don't give confidence intervals, they give credible 
intervals, and those aren't the same thing even if they are sometimes 
numerically similar.


Duncan Murdoch



func <- function(data, idx) {
clyde <- coef(lm(Score~ Time + factor(Country),data=data))
ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,]))
urk <- rep(NA,length(clyde))
names(urk) <-names(clyde)
urk[names(ccc)] <- ccc
urk
}

It produced a result:


set.seed(42)
B= boot(e, func, R=1000)

B

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = e, statistic = func, R = 1000)


Bootstrap Statistics :
   original biasstd. error
t1*  609.62500  3.620405295.39452
t2*  -54.81250 -1.662470436.32911
t3*  -41.3 -2.7337992   100.72113
t4*  -96.0 -1.099571899.78864
t5* -126.0 -0.654888663.47076
t6*  -26.3 -1.651668387.80483
t7*  -15.7 -0.839117091.72467
t8*  -21.7 -5.454401383.69211
t9*   18.3 -0.771100185.57278


However I have no idea if the result is correct, or even meaningful. I
have no idea what I'm doing.  Just hammering and hoping. ️



cheers,

Rolf



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Strange results : bootrstrp CIs

2024-01-13 Thread Rolf Turner
On Sat, 13 Jan 2024 17:59:16 -0500
Duncan Murdoch  wrote:



> My guess is that one of the bootstrap samples had a different
> selection of countries, so factor(Country) had different levels, and
> that would really mess things up.
> 
> You'll need to decide how to handle that:  If you are trying to
> estimate the coefficient for Italy in a sample that contains no data
> from Italy, what should the coefficient be?

Perhaps NA?  Ben Bolker conjectured that boot() might be able to handle
this.  Getting the NAs into the coefficients is a bit of a fag, but.  I
tried:

func <- function(data, idx) {
clyde <- coef(lm(Score~ Time + factor(Country),data=data))
ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,]))
urk <- rep(NA,length(clyde))
names(urk) <-names(clyde)
urk[names(ccc)] <- ccc
urk
}

It produced a result:

> > set.seed(42)
> > B= boot(e, func, R=1000)
> B
> 
> ORDINARY NONPARAMETRIC BOOTSTRAP
> 
> 
> Call:
> boot(data = e, statistic = func, R = 1000)
> 
> 
> Bootstrap Statistics :
>   original biasstd. error
> t1*  609.62500  3.620405295.39452
> t2*  -54.81250 -1.662470436.32911
> t3*  -41.3 -2.7337992   100.72113
> t4*  -96.0 -1.099571899.78864
> t5* -126.0 -0.654888663.47076
> t6*  -26.3 -1.651668387.80483
> t7*  -15.7 -0.839117091.72467
> t8*  -21.7 -5.454401383.69211
> t9*   18.3 -0.771100185.57278

However I have no idea if the result is correct, or even meaningful. I
have no idea what I'm doing.  Just hammering and hoping. ️



cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Strange results : bootrstrp CIs

2024-01-13 Thread Duncan Murdoch

Sorry, didn't cc this to the list.


 Forwarded Message 
Subject: Re: [R] Strange results : bootrstrp CIs
Date: Sat, 13 Jan 2024 17:37:19 -0500
From: Duncan Murdoch 
To: varin sacha 

You can debug things like this by setting options(error = recover). That 
will drop into the debugger when the error occurs.  Examine t.star, r, 
and res[[r]] and you will likely see what the problem was.


My guess is that one of the bootstrap samples had a different selection 
of countries, so factor(Country) had different levels, and that would 
really mess things up.


You'll need to decide how to handle that:  If you are trying to estimate 
the coefficient for Italy in a sample that contains no data from Italy, 
what should the coefficient be?  (This is easier for Bayesians to 
handle:  we don't need data!)


Duncan

On 13/01/2024 5:22 p.m., varin sacha via R-help wrote:

Dear Duncan,
Dear Ivan,

I really thank you a lot for your response.
So, if I correctly understand your answers the problem is coming from this line:

coef(lm(Score~ Time + factor(Country)),data=data[idx,])

This line should be:
coef(lm(Score~ Time + factor(Country),data=data[idx,]))

If yes, now I get an error message (code here below)! So, it still does not 
work.

Error in t.star[r, ] <- res[[r]] :
   number of items to replace is not a multiple of replacement length


##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
  
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
  
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
  
e=data.frame(Score, Country, Time)
  
  
library(boot)

func= function(data, idx) {
coef(lm(Score~ Time + factor(Country),data=data[idx,]))
}
B= boot(e, func, R=1000)
  
boot.ci(B, index=2, type="perc")

#








Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov  a 
écrit :





В Sat, 13 Jan 2024 20:33:47 + (UTC)

varin sacha via R-help  пишет:


coef(lm(Score~ Time + factor(Country)),data=data[idx,])



Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.