Re: [R] Fwd: Strange results : bootrstrp CIs
Dear R-experts, I really thank you all for your responses. Best, Le dimanche 14 janvier 2024 à 10:22:12 UTC+1, Duncan Murdoch a écrit : On 13/01/2024 8:58 p.m., Rolf Turner wrote: > On Sat, 13 Jan 2024 17:59:16 -0500 > Duncan Murdoch wrote: > > > >> My guess is that one of the bootstrap samples had a different >> selection of countries, so factor(Country) had different levels, and >> that would really mess things up. >> >> You'll need to decide how to handle that: If you are trying to >> estimate the coefficient for Italy in a sample that contains no data >> from Italy, what should the coefficient be? > > Perhaps NA? Ben Bolker conjectured that boot() might be able to handle > this. Getting the NAs into the coefficients is a bit of a fag, but. I > tried: My question was really intended as a statistical question. From a statistical perspective, if I have a sampling scheme that sometimes generates sample size 0, should my CI be (-Inf, Inf) for high enough confidence level? A Bayesian might say that inference should be entirely based on the prior in the case of no relevant data. You could get similar numerical results by adding some fake data to every bootstrap sample, e.g. a single weighted observation for each country at your prior mean for that country, with weight chosen to match the strength of the prior. But Bayesian methods don't give confidence intervals, they give credible intervals, and those aren't the same thing even if they are sometimes numerically similar. Duncan Murdoch > func <- function(data, idx) { > clyde <- coef(lm(Score~ Time + factor(Country),data=data)) > ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,])) > urk <- rep(NA,length(clyde)) > names(urk) <-names(clyde) > urk[names(ccc)] <- ccc > urk > } > > It produced a result: > >>> set.seed(42) >>> B= boot(e, func, R=1000) >> B >> >> ORDINARY NONPARAMETRIC BOOTSTRAP >> >> >> Call: >> boot(data = e, statistic = func, R = 1000) >> >> >> Bootstrap Statistics : >> original bias std. error >> t1* 609.62500 3.6204052 95.39452 >> t2* -54.81250 -1.6624704 36.32911 >> t3* -41.3 -2.7337992 100.72113 >> t4* -96.0 -1.0995718 99.78864 >> t5* -126.0 -0.6548886 63.47076 >> t6* -26.3 -1.6516683 87.80483 >> t7* -15.7 -0.8391170 91.72467 >> t8* -21.7 -5.4544013 83.69211 >> t9* 18.3 -0.7711001 85.57278 > > However I have no idea if the result is correct, or even meaningful. I > have no idea what I'm doing. Just hammering and hoping. ️ > > > > cheers, > > Rolf > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Strange results : bootrstrp CIs
On 13/01/2024 8:58 p.m., Rolf Turner wrote: On Sat, 13 Jan 2024 17:59:16 -0500 Duncan Murdoch wrote: My guess is that one of the bootstrap samples had a different selection of countries, so factor(Country) had different levels, and that would really mess things up. You'll need to decide how to handle that: If you are trying to estimate the coefficient for Italy in a sample that contains no data from Italy, what should the coefficient be? Perhaps NA? Ben Bolker conjectured that boot() might be able to handle this. Getting the NAs into the coefficients is a bit of a fag, but. I tried: My question was really intended as a statistical question. From a statistical perspective, if I have a sampling scheme that sometimes generates sample size 0, should my CI be (-Inf, Inf) for high enough confidence level? A Bayesian might say that inference should be entirely based on the prior in the case of no relevant data. You could get similar numerical results by adding some fake data to every bootstrap sample, e.g. a single weighted observation for each country at your prior mean for that country, with weight chosen to match the strength of the prior. But Bayesian methods don't give confidence intervals, they give credible intervals, and those aren't the same thing even if they are sometimes numerically similar. Duncan Murdoch func <- function(data, idx) { clyde <- coef(lm(Score~ Time + factor(Country),data=data)) ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,])) urk <- rep(NA,length(clyde)) names(urk) <-names(clyde) urk[names(ccc)] <- ccc urk } It produced a result: set.seed(42) B= boot(e, func, R=1000) B ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = e, statistic = func, R = 1000) Bootstrap Statistics : original biasstd. error t1* 609.62500 3.620405295.39452 t2* -54.81250 -1.662470436.32911 t3* -41.3 -2.7337992 100.72113 t4* -96.0 -1.099571899.78864 t5* -126.0 -0.654888663.47076 t6* -26.3 -1.651668387.80483 t7* -15.7 -0.839117091.72467 t8* -21.7 -5.454401383.69211 t9* 18.3 -0.771100185.57278 However I have no idea if the result is correct, or even meaningful. I have no idea what I'm doing. Just hammering and hoping. ️ cheers, Rolf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Strange results : bootrstrp CIs
On Sat, 13 Jan 2024 17:59:16 -0500 Duncan Murdoch wrote: > My guess is that one of the bootstrap samples had a different > selection of countries, so factor(Country) had different levels, and > that would really mess things up. > > You'll need to decide how to handle that: If you are trying to > estimate the coefficient for Italy in a sample that contains no data > from Italy, what should the coefficient be? Perhaps NA? Ben Bolker conjectured that boot() might be able to handle this. Getting the NAs into the coefficients is a bit of a fag, but. I tried: func <- function(data, idx) { clyde <- coef(lm(Score~ Time + factor(Country),data=data)) ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,])) urk <- rep(NA,length(clyde)) names(urk) <-names(clyde) urk[names(ccc)] <- ccc urk } It produced a result: > > set.seed(42) > > B= boot(e, func, R=1000) > B > > ORDINARY NONPARAMETRIC BOOTSTRAP > > > Call: > boot(data = e, statistic = func, R = 1000) > > > Bootstrap Statistics : > original biasstd. error > t1* 609.62500 3.620405295.39452 > t2* -54.81250 -1.662470436.32911 > t3* -41.3 -2.7337992 100.72113 > t4* -96.0 -1.099571899.78864 > t5* -126.0 -0.654888663.47076 > t6* -26.3 -1.651668387.80483 > t7* -15.7 -0.839117091.72467 > t8* -21.7 -5.454401383.69211 > t9* 18.3 -0.771100185.57278 However I have no idea if the result is correct, or even meaningful. I have no idea what I'm doing. Just hammering and hoping. ️ cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. (secretaries) phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Strange results : bootrstrp CIs
Sorry, didn't cc this to the list. Forwarded Message Subject: Re: [R] Strange results : bootrstrp CIs Date: Sat, 13 Jan 2024 17:37:19 -0500 From: Duncan Murdoch To: varin sacha You can debug things like this by setting options(error = recover). That will drop into the debugger when the error occurs. Examine t.star, r, and res[[r]] and you will likely see what the problem was. My guess is that one of the bootstrap samples had a different selection of countries, so factor(Country) had different levels, and that would really mess things up. You'll need to decide how to handle that: If you are trying to estimate the coefficient for Italy in a sample that contains no data from Italy, what should the coefficient be? (This is easier for Bayesians to handle: we don't need data!) Duncan On 13/01/2024 5:22 p.m., varin sacha via R-help wrote: Dear Duncan, Dear Ivan, I really thank you a lot for your response. So, if I correctly understand your answers the problem is coming from this line: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) This line should be: coef(lm(Score~ Time + factor(Country),data=data[idx,])) If yes, now I get an error message (code here below)! So, it still does not work. Error in t.star[r, ] <- res[[r]] : number of items to replace is not a multiple of replacement length ## Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501) Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria") Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) e=data.frame(Score, Country, Time) library(boot) func= function(data, idx) { coef(lm(Score~ Time + factor(Country),data=data[idx,])) } B= boot(e, func, R=1000) boot.ci(B, index=2, type="perc") # Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov a écrit : В Sat, 13 Jan 2024 20:33:47 + (UTC) varin sacha via R-help пишет: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) Wrong place for the data=... argument. You meant to give it to lm(...), but in the end it went to coef(...). Without the data=... argument, the formula passed to lm() picks up the global variables inherited by the func() closure. Unfortunately, S3 methods really do have to ignore extra arguments they don't understand if the class is to be extended, so coef.lm isn't allowed to complain to you about it. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.