Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Rolf Turner


On Sat, 13 Jan 2024 16:54:01 -0800
Bert Gunter  wrote:

> Well, this would seem to work:
> 
> e <- data.frame(Score = Score
>  , Country = factor(Country)
>  , Time = Time)
> 
> ncountry <- nlevels(e$Country)
> func= function(dat,idx) {
>if(length(unique(dat[idx,'Country'])) < ncountry) NA
>else coef(lm(Score~ Time + Country,data = dat[idx,]))
> }
> B <-  boot(e, func, R=1000)
> 
> boot.ci(B, index=2, type="perc")
> 
> Caveats:
> 1) boot.ci handles the NA's by omitting them, which of course gives a
> smaller resample and longer CI's than the value of R specified in the
> call to boot().
> 
> 2) I do not know if the *nice* statistical properties of the
> nonparametric bootstrap, e.g. asymptotic correctness, hold when
> bootstrap samples are produced in this way.  I leave that to wiser
> heads than me.



It seems to me that my shaganappi idea causes func() to return a vector
of coefficients with NAs corresponding to any missing levels of the
"Country" factor, whereas your idea causes it to return a scalar NA
whenever one or more of the levels of the "Country" factor is missing.

I have no idea what the implications of this are.  As I said before, I
have no idea what I am doing!

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Strange results : bootrstrp CIs

2024-01-13 Thread Rolf Turner
On Sat, 13 Jan 2024 17:59:16 -0500
Duncan Murdoch  wrote:



> My guess is that one of the bootstrap samples had a different
> selection of countries, so factor(Country) had different levels, and
> that would really mess things up.
> 
> You'll need to decide how to handle that:  If you are trying to
> estimate the coefficient for Italy in a sample that contains no data
> from Italy, what should the coefficient be?

Perhaps NA?  Ben Bolker conjectured that boot() might be able to handle
this.  Getting the NAs into the coefficients is a bit of a fag, but.  I
tried:

func <- function(data, idx) {
clyde <- coef(lm(Score~ Time + factor(Country),data=data))
ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,]))
urk <- rep(NA,length(clyde))
names(urk) <-names(clyde)
urk[names(ccc)] <- ccc
urk
}

It produced a result:

> > set.seed(42)
> > B= boot(e, func, R=1000)
> B
> 
> ORDINARY NONPARAMETRIC BOOTSTRAP
> 
> 
> Call:
> boot(data = e, statistic = func, R = 1000)
> 
> 
> Bootstrap Statistics :
>   original biasstd. error
> t1*  609.62500  3.620405295.39452
> t2*  -54.81250 -1.662470436.32911
> t3*  -41.3 -2.7337992   100.72113
> t4*  -96.0 -1.099571899.78864
> t5* -126.0 -0.654888663.47076
> t6*  -26.3 -1.651668387.80483
> t7*  -15.7 -0.839117091.72467
> t8*  -21.7 -5.454401383.69211
> t9*   18.3 -0.771100185.57278

However I have no idea if the result is correct, or even meaningful. I
have no idea what I'm doing.  Just hammering and hoping. ️



cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Bert Gunter
Well, this would seem to work:

e <- data.frame(Score = Score
 , Country = factor(Country)
 , Time = Time)

ncountry <- nlevels(e$Country)
func= function(dat,idx) {
   if(length(unique(dat[idx,'Country'])) < ncountry) NA
   else coef(lm(Score~ Time + Country,data = dat[idx,]))
}
B <-  boot(e, func, R=1000)

boot.ci(B, index=2, type="perc")

Caveats:
1) boot.ci handles the NA's by omitting them, which of course gives a
smaller resample and longer CI's than the value of R specified in the call
to boot().

2) I do not know if the *nice* statistical properties of the nonparametric
bootstrap, e.g. asymptotic correctness, hold when bootstrap samples are
produced in this way.  I leave that to wiser heads than me.

Cheers,
Bert

On Sat, Jan 13, 2024 at 2:51 PM Ben Bolker  wrote:

>It took me a little while to figure this out, but: the problem is
> that if your resampling leaves out any countries (which is very likely),
> your model applied to the bootstrapped data will have fewer coefficients
> than your original model.
>
> I tried this:
>
> cc <- unique(e$Country)
> func <- function(data, idx) {
> coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
> }
>
> but lm() automatically drops coefficients for missing countries (I
> didn't think about it too hard, but I thought they might get returned as
> NA and that boot() might be able to handle that ...)
>
>If you want to do this I think you'll have to find a way to do a
> *stratified* bootstrap, restricting the bootstrap samples so that they
> always contain at least one sample from each country ... (I would have
> expected "strata = as.numeric(e$Country)" to do this, but it doesn't
> work the way I thought ... it tries to compute the statistics for *each*
> stratum ...)
>
>
>
> 
>
>   Debugging attempts:
>
> set.seed(101)
> options(error=recover)
> B= boot(e, func, R=1000)
>
>
> Error in t.star[r, ] <- res[[r]] :
>number of items to replace is not a multiple of replacement length
>
> Enter a frame number, or 0 to exit
>
> 1: boot(e, func, R = 1000)
>
> 
>
> Selection: 1
> Called from: top level
> Browse[1]> print(r)
> [1] 2
> Browse[1]> t.star[r,]
> [1] NA NA NA NA NA NA NA NA NA
>
> i[2,]
>   [1] 14 15 22 22 21 14  8  2 12 22 10 15  9  7  9 13 12 23  1 20 15  7
> 5 10
>
>
>
>
> On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:
> > Dear Duncan,
> > Dear Ivan,
> >
> > I really thank you a lot for your response.
> > So, if I correctly understand your answers the problem is coming from
> this line:
> >
> > coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> >
> > This line should be:
> > coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> >
> > If yes, now I get an error message (code here below)! So, it still does
> not work.
> >
> > Error in t.star[r, ] <- res[[r]] :
> >number of items to replace is not a multiple of replacement length
> >
> >
> > ##
> >
> Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
> 345,456,543,501)
> >
> > Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey",
> "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal",
> "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria",
> "Austria", "Austria")
> >
> > Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
> >
> > e=data.frame(Score, Country, Time)
> >
> >
> > library(boot)
> > func= function(data, idx) {
> > coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> > }
> > B= boot(e, func, R=1000)
> >
> > boot.ci(B, index=2, type="perc")
> > #
> >
> >
> >
> >
> >
> >
> >
> >
> > Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov <
> ikry...@disroot.org> a écrit :
> >
> >
> >
> >
> >
> > В Sat, 13 Jan 2024 20:33:47 + (UTC)
> >
> > varin sacha via R-help  пишет:
> >
> >> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
> >
> >
> > Wrong place for the data=... argument. You meant to give it to lm(...),
> > but in the end it went to coef(...). Without the data=... argument, the
> > formula passed to lm() picks up the global variables inherited by the
> > func() closure.
> >
> > Unfortunately, S3 methods really do have to ignore extra arguments they
> > don't understand if the class is to be extended, so coef.lm isn't
> > allowed to complain to you about it.
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

[R] Fwd: Strange results : bootrstrp CIs

2024-01-13 Thread Duncan Murdoch

Sorry, didn't cc this to the list.


 Forwarded Message 
Subject: Re: [R] Strange results : bootrstrp CIs
Date: Sat, 13 Jan 2024 17:37:19 -0500
From: Duncan Murdoch 
To: varin sacha 

You can debug things like this by setting options(error = recover). That 
will drop into the debugger when the error occurs.  Examine t.star, r, 
and res[[r]] and you will likely see what the problem was.


My guess is that one of the bootstrap samples had a different selection 
of countries, so factor(Country) had different levels, and that would 
really mess things up.


You'll need to decide how to handle that:  If you are trying to estimate 
the coefficient for Italy in a sample that contains no data from Italy, 
what should the coefficient be?  (This is easier for Bayesians to 
handle:  we don't need data!)


Duncan

On 13/01/2024 5:22 p.m., varin sacha via R-help wrote:

Dear Duncan,
Dear Ivan,

I really thank you a lot for your response.
So, if I correctly understand your answers the problem is coming from this line:

coef(lm(Score~ Time + factor(Country)),data=data[idx,])

This line should be:
coef(lm(Score~ Time + factor(Country),data=data[idx,]))

If yes, now I get an error message (code here below)! So, it still does not 
work.

Error in t.star[r, ] <- res[[r]] :
   number of items to replace is not a multiple of replacement length


##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
  
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
  
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
  
e=data.frame(Score, Country, Time)
  
  
library(boot)

func= function(data, idx) {
coef(lm(Score~ Time + factor(Country),data=data[idx,]))
}
B= boot(e, func, R=1000)
  
boot.ci(B, index=2, type="perc")

#








Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov  a 
écrit :





В Sat, 13 Jan 2024 20:33:47 + (UTC)

varin sacha via R-help  пишет:


coef(lm(Score~ Time + factor(Country)),data=data[idx,])



Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Ben Bolker
  It took me a little while to figure this out, but: the problem is 
that if your resampling leaves out any countries (which is very likely), 
your model applied to the bootstrapped data will have fewer coefficients 
than your original model.


I tried this:

cc <- unique(e$Country)
func <- function(data, idx) {
   coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
}

but lm() automatically drops coefficients for missing countries (I 
didn't think about it too hard, but I thought they might get returned as 
NA and that boot() might be able to handle that ...)


  If you want to do this I think you'll have to find a way to do a 
*stratified* bootstrap, restricting the bootstrap samples so that they 
always contain at least one sample from each country ... (I would have 
expected "strata = as.numeric(e$Country)" to do this, but it doesn't 
work the way I thought ... it tries to compute the statistics for *each* 
stratum ...)






 Debugging attempts:

set.seed(101)
options(error=recover)
B= boot(e, func, R=1000)


Error in t.star[r, ] <- res[[r]] :
  number of items to replace is not a multiple of replacement length

Enter a frame number, or 0 to exit

1: boot(e, func, R = 1000)



Selection: 1
Called from: top level
Browse[1]> print(r)
[1] 2
Browse[1]> t.star[r,]
[1] NA NA NA NA NA NA NA NA NA

i[2,]
 [1] 14 15 22 22 21 14  8  2 12 22 10 15  9  7  9 13 12 23  1 20 15  7 
5 10





On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:

Dear Duncan,
Dear Ivan,

I really thank you a lot for your response.
So, if I correctly understand your answers the problem is coming from this line:

coef(lm(Score~ Time + factor(Country)),data=data[idx,])

This line should be:
coef(lm(Score~ Time + factor(Country),data=data[idx,]))

If yes, now I get an error message (code here below)! So, it still does not 
work.

Error in t.star[r, ] <- res[[r]] :
   number of items to replace is not a multiple of replacement length


##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
  
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
  
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
  
e=data.frame(Score, Country, Time)
  
  
library(boot)

func= function(data, idx) {
coef(lm(Score~ Time + factor(Country),data=data[idx,]))
}
B= boot(e, func, R=1000)
  
boot.ci(B, index=2, type="perc")

#








Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov  a 
écrit :





В Sat, 13 Jan 2024 20:33:47 + (UTC)

varin sacha via R-help  пишет:


coef(lm(Score~ Time + factor(Country)),data=data[idx,])



Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread varin sacha via R-help
Dear Duncan,
Dear Ivan,

I really thank you a lot for your response.
So, if I correctly understand your answers the problem is coming from this line:

coef(lm(Score~ Time + factor(Country)),data=data[idx,])

This line should be:
coef(lm(Score~ Time + factor(Country),data=data[idx,]))

If yes, now I get an error message (code here below)! So, it still does not 
work.

Error in t.star[r, ] <- res[[r]] :
  number of items to replace is not a multiple of replacement length


##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
 
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", 
"USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", 
"UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
 
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
 
e=data.frame(Score, Country, Time)
 
 
library(boot)
func= function(data, idx) {
coef(lm(Score~ Time + factor(Country),data=data[idx,]))
}
B= boot(e, func, R=1000)
 
boot.ci(B, index=2, type="perc")
#








Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov  a 
écrit : 





В Sat, 13 Jan 2024 20:33:47 + (UTC)

varin sacha via R-help  пишет:

> coef(lm(Score~ Time + factor(Country)),data=data[idx,])


Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Ivan Krylov via R-help
В Sat, 13 Jan 2024 20:33:47 + (UTC)
varin sacha via R-help  пишет:

> coef(lm(Score~ Time + factor(Country)),data=data[idx,])

Wrong place for the data=... argument. You meant to give it to lm(...),
but in the end it went to coef(...). Without the data=... argument, the
formula passed to lm() picks up the global variables inherited by the
func() closure.

Unfortunately, S3 methods really do have to ignore extra arguments they
don't understand if the class is to be extended, so coef.lm isn't
allowed to complain to you about it.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange results : bootrstrp CIs

2024-01-13 Thread Duncan Murdoch

On 13/01/2024 3:33 p.m., varin sacha via R-help wrote:

Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
  345,456,543,501)
  
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey",

"USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal",
"Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria",
"Austria", "Austria")
  
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
  
e=data.frame(Score, Country, Time)
  
  
library(boot)

func= function(data, idx) {
coef(lm(Score~ Time + factor(Country)),data=data[idx,])
}
B= boot(e, func, R=1000)
  
boot.ci(B, index=2, type="perc")


Your function ignores the data, because it passes data[idx,] to coef(), 
not to lm().  coef() ignores it.  So the function is using the global 
variables you created earlier, not the ones in e.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Strange results : bootrstrp CIs

2024-01-13 Thread varin sacha via R-help
Dear R-experts,

Here below, my R code working BUT I get a strange result I was not expecting! 
Indeed, the 95% percentile bootstrap CIs is (-54.81, -54.81 ). Is anything 
going wrong?

Best,

##
Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345,
 345,456,543,501)
 
Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", 
"USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", 
"UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
 
Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
 
e=data.frame(Score, Country, Time)
 
 
library(boot)
func= function(data, idx) {
coef(lm(Score~ Time + factor(Country)),data=data[idx,])
}
B= boot(e, func, R=1000)
 
boot.ci(B, index=2, type="perc")
#

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.