Re: [R] How to do bootstrap for the complex sample design?
Dear Professor Lumley; Thank you so much for your invaluable advice! I will digest your advice and try different methods. Great thanks again! Faye > Date: Fri, 5 Nov 2010 08:24:00 +1300 > Subject: Re: [R] How to do bootstrap for the complex sample design? > From: tlum...@uw.edu > To: timhesterb...@gmail.com > CC: feix...@hotmail.com; r-help@r-project.org > > On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg > wrote: > > Faye wrote: > >>Our survey is structured as : To be investigated area is divided into > >>6 regions, within each region, one urban community and one rural > >>community are randomly selected, then samples are randomly drawn from > >>each selected uran and rural community. > >> > >>The problems is that in urban/rural stratum, we only have one sample. > >>In this case, how to do bootstrap? > > > > You are lucky that your sample size is 1. If it were 2 you would > > probably have proceeded without realizing that the answers were wrong. > > > > Suppose you had two samples in each stratum. If you proceed naturally, > > drawing bootstrap samples of size 2 from each stratum, this would > > underestimate variability by a factor of 2. > > > > In general the ordinary nonparametric bootstrap estimates of variability > > are biased downward by a factor of (n-1)/n -- exactly for the mean, > > approximately for other statistics. In multiple-sample and stratified > > situations, the bias depends on the stratum sizes. > > > > Three remedies are: > > * draw bootstrap samples of size n-1 > > * "bootknife" sampling - omit one observation (a jackknife sample), then > > draw a bootstrap sample of size n from that > > * bootstrap from a kernel density estimate, with kernel covariance equal > > to empirical covariance (with divisor n-1) / n. > > The latter two are described in > > Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. > > Smoothing, Proceedings of the Section on Statistics and the Environment, > > American Statistical Association, 2924-2930. > > http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf > > > > All three are undefined for samples of size 1. You need to go to some > > other bootstrap, e.g. a parametric bootstrap with variability estimated > > from other data. > > > > And the 'survey' package supplies the first option. (It also supplies > a bootstrap sample of size n that allows finite population > corrections, designed for situations with a large n and a high > sampling fraction, such as some business surveys.) > > With a sample size of 1 per stratum there are no design-unbiased > estimators of the standard error, so as others have said you need > external data. > > -thomas > > > -- > Thomas Lumley > Professor of Biostatistics > University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do bootstrap for the complex sample design?
On Fri, Nov 5, 2010 at 3:51 AM, Tim Hesterberg wrote: > Faye wrote: >>Our survey is structured as : To be investigated area is divided into >>6 regions, within each region, one urban community and one rural >>community are randomly selected, then samples are randomly drawn from >>each selected uran and rural community. >> >>The problems is that in urban/rural stratum, we only have one sample. >>In this case, how to do bootstrap? > > You are lucky that your sample size is 1. If it were 2 you would > probably have proceeded without realizing that the answers were wrong. > > Suppose you had two samples in each stratum. If you proceed naturally, > drawing bootstrap samples of size 2 from each stratum, this would > underestimate variability by a factor of 2. > > In general the ordinary nonparametric bootstrap estimates of variability > are biased downward by a factor of (n-1)/n -- exactly for the mean, > approximately for other statistics. In multiple-sample and stratified > situations, the bias depends on the stratum sizes. > > Three remedies are: > * draw bootstrap samples of size n-1 > * "bootknife" sampling - omit one observation (a jackknife sample), then > draw a bootstrap sample of size n from that > * bootstrap from a kernel density estimate, with kernel covariance equal > to empirical covariance (with divisor n-1) / n. > The latter two are described in > Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. > Smoothing, Proceedings of the Section on Statistics and the Environment, > American Statistical Association, 2924-2930. > http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf > > All three are undefined for samples of size 1. You need to go to some > other bootstrap, e.g. a parametric bootstrap with variability estimated > from other data. > And the 'survey' package supplies the first option. (It also supplies a bootstrap sample of size n that allows finite population corrections, designed for situations with a large n and a high sampling fraction, such as some business surveys.) With a sample size of 1 per stratum there are no design-unbiased estimators of the standard error, so as others have said you need external data. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do bootstrap for the complex sample design?
Faye wrote: >Our survey is structured as : To be investigated area is divided into >6 regions, within each region, one urban community and one rural >community are randomly selected, then samples are randomly drawn from >each selected uran and rural community. > >The problems is that in urban/rural stratum, we only have one sample. >In this case, how to do bootstrap? You are lucky that your sample size is 1. If it were 2 you would probably have proceeded without realizing that the answers were wrong. Suppose you had two samples in each stratum. If you proceed naturally, drawing bootstrap samples of size 2 from each stratum, this would underestimate variability by a factor of 2. In general the ordinary nonparametric bootstrap estimates of variability are biased downward by a factor of (n-1)/n -- exactly for the mean, approximately for other statistics. In multiple-sample and stratified situations, the bias depends on the stratum sizes. Three remedies are: * draw bootstrap samples of size n-1 * "bootknife" sampling - omit one observation (a jackknife sample), then draw a bootstrap sample of size n from that * bootstrap from a kernel density estimate, with kernel covariance equal to empirical covariance (with divisor n-1) / n. The latter two are described in Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930. http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf All three are undefined for samples of size 1. You need to go to some other bootstrap, e.g. a parametric bootstrap with variability estimated from other data. Tim Hesterberg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to do bootstrap for the complex sample design?
At 01:38 AM 11/4/2010, Fei xu wrote: Hello; Our survey is structured as : To be investigated area is divided into 6 regions, within each region, one urban community and one rural community are randomly selected, then samples are randomly drawn from each selected uran and rural community. The problems is that in urban/rural stratum, we only have one sample. In this case, how to do bootstrap? Any comments or hints are greatly appreciated! Faye Just make a table of your data, with each row corresponding to a measurement. You columns will be Region, UrbanCommunity, RuralCommunity and your response variables. Bootstrap resampling is just generating random row indices into this table, with replacement. I.e., index<- sample(1:N, N, replace=TRUE) Then your resample is myTable[index,]. Because you chose UrbanCommunity and RuralCommunity randomly, this shouldn't be a problem. The fact that you choose a subsample size of 1 means you won't be able to estimate within-region variances unless you make some serious assumptions (e.g., UrbanCommunity effect independent of Region effect). Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 "Vere scire est per causas scire" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to do bootstrap for the complex sample design?
Hello; Our survey is structured as : To be investigated area is divided into 6 regions, within each region, one urban community and one rural community are randomly selected, then samples are randomly drawn from each selected uran and rural community. The problems is that in urban/rural stratum, we only have one sample. In this case, how to do bootstrap? Any comments or hints are greatly appreciated! Faye [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.