Re: [R] Strange results : bootrstrp CIs
On Sat, 13 Jan 2024 16:54:01 -0800 Bert Gunter wrote: > Well, this would seem to work: > > e <- data.frame(Score = Score > , Country = factor(Country) > , Time = Time) > > ncountry <- nlevels(e$Country) > func= function(dat,idx) { >if(length(unique(dat[idx,'Country'])) < ncountry) NA >else coef(lm(Score~ Time + Country,data = dat[idx,])) > } > B <- boot(e, func, R=1000) > > boot.ci(B, index=2, type="perc") > > Caveats: > 1) boot.ci handles the NA's by omitting them, which of course gives a > smaller resample and longer CI's than the value of R specified in the > call to boot(). > > 2) I do not know if the *nice* statistical properties of the > nonparametric bootstrap, e.g. asymptotic correctness, hold when > bootstrap samples are produced in this way. I leave that to wiser > heads than me. It seems to me that my shaganappi idea causes func() to return a vector of coefficients with NAs corresponding to any missing levels of the "Country" factor, whereas your idea causes it to return a scalar NA whenever one or more of the levels of the "Country" factor is missing. I have no idea what the implications of this are. As I said before, I have no idea what I am doing! cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. (secretaries) phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fwd: Strange results : bootrstrp CIs
On Sat, 13 Jan 2024 17:59:16 -0500 Duncan Murdoch wrote: > My guess is that one of the bootstrap samples had a different > selection of countries, so factor(Country) had different levels, and > that would really mess things up. > > You'll need to decide how to handle that: If you are trying to > estimate the coefficient for Italy in a sample that contains no data > from Italy, what should the coefficient be? Perhaps NA? Ben Bolker conjectured that boot() might be able to handle this. Getting the NAs into the coefficients is a bit of a fag, but. I tried: func <- function(data, idx) { clyde <- coef(lm(Score~ Time + factor(Country),data=data)) ccc <- coef(lm(Score~ Time + factor(Country),data=data[idx,])) urk <- rep(NA,length(clyde)) names(urk) <-names(clyde) urk[names(ccc)] <- ccc urk } It produced a result: > > set.seed(42) > > B= boot(e, func, R=1000) > B > > ORDINARY NONPARAMETRIC BOOTSTRAP > > > Call: > boot(data = e, statistic = func, R = 1000) > > > Bootstrap Statistics : > original biasstd. error > t1* 609.62500 3.620405295.39452 > t2* -54.81250 -1.662470436.32911 > t3* -41.3 -2.7337992 100.72113 > t4* -96.0 -1.099571899.78864 > t5* -126.0 -0.654888663.47076 > t6* -26.3 -1.651668387.80483 > t7* -15.7 -0.839117091.72467 > t8* -21.7 -5.454401383.69211 > t9* 18.3 -0.771100185.57278 However I have no idea if the result is correct, or even meaningful. I have no idea what I'm doing. Just hammering and hoping. ️ cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Stats. Dep't. (secretaries) phone: +64-9-373-7599 ext. 89622 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange results : bootrstrp CIs
Well, this would seem to work: e <- data.frame(Score = Score , Country = factor(Country) , Time = Time) ncountry <- nlevels(e$Country) func= function(dat,idx) { if(length(unique(dat[idx,'Country'])) < ncountry) NA else coef(lm(Score~ Time + Country,data = dat[idx,])) } B <- boot(e, func, R=1000) boot.ci(B, index=2, type="perc") Caveats: 1) boot.ci handles the NA's by omitting them, which of course gives a smaller resample and longer CI's than the value of R specified in the call to boot(). 2) I do not know if the *nice* statistical properties of the nonparametric bootstrap, e.g. asymptotic correctness, hold when bootstrap samples are produced in this way. I leave that to wiser heads than me. Cheers, Bert On Sat, Jan 13, 2024 at 2:51 PM Ben Bolker wrote: >It took me a little while to figure this out, but: the problem is > that if your resampling leaves out any countries (which is very likely), > your model applied to the bootstrapped data will have fewer coefficients > than your original model. > > I tried this: > > cc <- unique(e$Country) > func <- function(data, idx) { > coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,])) > } > > but lm() automatically drops coefficients for missing countries (I > didn't think about it too hard, but I thought they might get returned as > NA and that boot() might be able to handle that ...) > >If you want to do this I think you'll have to find a way to do a > *stratified* bootstrap, restricting the bootstrap samples so that they > always contain at least one sample from each country ... (I would have > expected "strata = as.numeric(e$Country)" to do this, but it doesn't > work the way I thought ... it tries to compute the statistics for *each* > stratum ...) > > > > > > Debugging attempts: > > set.seed(101) > options(error=recover) > B= boot(e, func, R=1000) > > > Error in t.star[r, ] <- res[[r]] : >number of items to replace is not a multiple of replacement length > > Enter a frame number, or 0 to exit > > 1: boot(e, func, R = 1000) > > > > Selection: 1 > Called from: top level > Browse[1]> print(r) > [1] 2 > Browse[1]> t.star[r,] > [1] NA NA NA NA NA NA NA NA NA > > i[2,] > [1] 14 15 22 22 21 14 8 2 12 22 10 15 9 7 9 13 12 23 1 20 15 7 > 5 10 > > > > > On 2024-01-13 5:22 p.m., varin sacha via R-help wrote: > > Dear Duncan, > > Dear Ivan, > > > > I really thank you a lot for your response. > > So, if I correctly understand your answers the problem is coming from > this line: > > > > coef(lm(Score~ Time + factor(Country)),data=data[idx,]) > > > > This line should be: > > coef(lm(Score~ Time + factor(Country),data=data[idx,])) > > > > If yes, now I get an error message (code here below)! So, it still does > not work. > > > > Error in t.star[r, ] <- res[[r]] : > >number of items to replace is not a multiple of replacement length > > > > > > ## > > > Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, > 345,456,543,501) > > > > Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", > "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", > "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", > "Austria", "Austria") > > > > Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) > > > > e=data.frame(Score, Country, Time) > > > > > > library(boot) > > func= function(data, idx) { > > coef(lm(Score~ Time + factor(Country),data=data[idx,])) > > } > > B= boot(e, func, R=1000) > > > > boot.ci(B, index=2, type="perc") > > # > > > > > > > > > > > > > > > > > > Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov < > ikry...@disroot.org> a écrit : > > > > > > > > > > > > В Sat, 13 Jan 2024 20:33:47 + (UTC) > > > > varin sacha via R-help пишет: > > > >> coef(lm(Score~ Time + factor(Country)),data=data[idx,]) > > > > > > Wrong place for the data=... argument. You meant to give it to lm(...), > > but in the end it went to coef(...). Without the data=... argument, the > > formula passed to lm() picks up the global variables inherited by the > > func() closure. > > > > Unfortunately, S3 methods really do have to ignore extra arguments they > > don't understand if the class is to be extended, so coef.lm isn't > > allowed to complain to you about it. > > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
[R] Fwd: Strange results : bootrstrp CIs
Sorry, didn't cc this to the list. Forwarded Message Subject: Re: [R] Strange results : bootrstrp CIs Date: Sat, 13 Jan 2024 17:37:19 -0500 From: Duncan Murdoch To: varin sacha You can debug things like this by setting options(error = recover). That will drop into the debugger when the error occurs. Examine t.star, r, and res[[r]] and you will likely see what the problem was. My guess is that one of the bootstrap samples had a different selection of countries, so factor(Country) had different levels, and that would really mess things up. You'll need to decide how to handle that: If you are trying to estimate the coefficient for Italy in a sample that contains no data from Italy, what should the coefficient be? (This is easier for Bayesians to handle: we don't need data!) Duncan On 13/01/2024 5:22 p.m., varin sacha via R-help wrote: Dear Duncan, Dear Ivan, I really thank you a lot for your response. So, if I correctly understand your answers the problem is coming from this line: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) This line should be: coef(lm(Score~ Time + factor(Country),data=data[idx,])) If yes, now I get an error message (code here below)! So, it still does not work. Error in t.star[r, ] <- res[[r]] : number of items to replace is not a multiple of replacement length ## Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501) Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria") Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) e=data.frame(Score, Country, Time) library(boot) func= function(data, idx) { coef(lm(Score~ Time + factor(Country),data=data[idx,])) } B= boot(e, func, R=1000) boot.ci(B, index=2, type="perc") # Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov a écrit : В Sat, 13 Jan 2024 20:33:47 + (UTC) varin sacha via R-help пишет: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) Wrong place for the data=... argument. You meant to give it to lm(...), but in the end it went to coef(...). Without the data=... argument, the formula passed to lm() picks up the global variables inherited by the func() closure. Unfortunately, S3 methods really do have to ignore extra arguments they don't understand if the class is to be extended, so coef.lm isn't allowed to complain to you about it. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange results : bootrstrp CIs
It took me a little while to figure this out, but: the problem is that if your resampling leaves out any countries (which is very likely), your model applied to the bootstrapped data will have fewer coefficients than your original model. I tried this: cc <- unique(e$Country) func <- function(data, idx) { coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,])) } but lm() automatically drops coefficients for missing countries (I didn't think about it too hard, but I thought they might get returned as NA and that boot() might be able to handle that ...) If you want to do this I think you'll have to find a way to do a *stratified* bootstrap, restricting the bootstrap samples so that they always contain at least one sample from each country ... (I would have expected "strata = as.numeric(e$Country)" to do this, but it doesn't work the way I thought ... it tries to compute the statistics for *each* stratum ...) Debugging attempts: set.seed(101) options(error=recover) B= boot(e, func, R=1000) Error in t.star[r, ] <- res[[r]] : number of items to replace is not a multiple of replacement length Enter a frame number, or 0 to exit 1: boot(e, func, R = 1000) Selection: 1 Called from: top level Browse[1]> print(r) [1] 2 Browse[1]> t.star[r,] [1] NA NA NA NA NA NA NA NA NA i[2,] [1] 14 15 22 22 21 14 8 2 12 22 10 15 9 7 9 13 12 23 1 20 15 7 5 10 On 2024-01-13 5:22 p.m., varin sacha via R-help wrote: Dear Duncan, Dear Ivan, I really thank you a lot for your response. So, if I correctly understand your answers the problem is coming from this line: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) This line should be: coef(lm(Score~ Time + factor(Country),data=data[idx,])) If yes, now I get an error message (code here below)! So, it still does not work. Error in t.star[r, ] <- res[[r]] : number of items to replace is not a multiple of replacement length ## Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501) Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria") Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) e=data.frame(Score, Country, Time) library(boot) func= function(data, idx) { coef(lm(Score~ Time + factor(Country),data=data[idx,])) } B= boot(e, func, R=1000) boot.ci(B, index=2, type="perc") # Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov a écrit : В Sat, 13 Jan 2024 20:33:47 + (UTC) varin sacha via R-help пишет: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) Wrong place for the data=... argument. You meant to give it to lm(...), but in the end it went to coef(...). Without the data=... argument, the formula passed to lm() picks up the global variables inherited by the func() closure. Unfortunately, S3 methods really do have to ignore extra arguments they don't understand if the class is to be extended, so coef.lm isn't allowed to complain to you about it. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange results : bootrstrp CIs
Dear Duncan, Dear Ivan, I really thank you a lot for your response. So, if I correctly understand your answers the problem is coming from this line: coef(lm(Score~ Time + factor(Country)),data=data[idx,]) This line should be: coef(lm(Score~ Time + factor(Country),data=data[idx,])) If yes, now I get an error message (code here below)! So, it still does not work. Error in t.star[r, ] <- res[[r]] : number of items to replace is not a multiple of replacement length ## Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501) Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria") Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) e=data.frame(Score, Country, Time) library(boot) func= function(data, idx) { coef(lm(Score~ Time + factor(Country),data=data[idx,])) } B= boot(e, func, R=1000) boot.ci(B, index=2, type="perc") # Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov a écrit : В Sat, 13 Jan 2024 20:33:47 + (UTC) varin sacha via R-help пишет: > coef(lm(Score~ Time + factor(Country)),data=data[idx,]) Wrong place for the data=... argument. You meant to give it to lm(...), but in the end it went to coef(...). Without the data=... argument, the formula passed to lm() picks up the global variables inherited by the func() closure. Unfortunately, S3 methods really do have to ignore extra arguments they don't understand if the class is to be extended, so coef.lm isn't allowed to complain to you about it. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange results : bootrstrp CIs
В Sat, 13 Jan 2024 20:33:47 + (UTC) varin sacha via R-help пишет: > coef(lm(Score~ Time + factor(Country)),data=data[idx,]) Wrong place for the data=... argument. You meant to give it to lm(...), but in the end it went to coef(...). Without the data=... argument, the formula passed to lm() picks up the global variables inherited by the func() closure. Unfortunately, S3 methods really do have to ignore extra arguments they don't understand if the class is to be extended, so coef.lm isn't allowed to complain to you about it. -- Best regards, Ivan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange results : bootrstrp CIs
On 13/01/2024 3:33 p.m., varin sacha via R-help wrote: Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501) Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria") Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) e=data.frame(Score, Country, Time) library(boot) func= function(data, idx) { coef(lm(Score~ Time + factor(Country)),data=data[idx,]) } B= boot(e, func, R=1000) boot.ci(B, index=2, type="perc") Your function ignores the data, because it passes data[idx,] to coef(), not to lm(). coef() ignores it. So the function is using the global variables you created earlier, not the ones in e. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange results : bootrstrp CIs
Dear R-experts, Here below, my R code working BUT I get a strange result I was not expecting! Indeed, the 95% percentile bootstrap CIs is (-54.81, -54.81 ). Is anything going wrong? Best, ## Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501) Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria") Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3) e=data.frame(Score, Country, Time) library(boot) func= function(data, idx) { coef(lm(Score~ Time + factor(Country)),data=data[idx,]) } B= boot(e, func, R=1000) boot.ci(B, index=2, type="perc") # __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.