[R] Distinct combinations for bootstrapping small sets

2007-03-06 Thread S Ellison
Small data sets (6-12 values, or a similarly small number of groups) which 
don't look nice and symmetric are quite common in my field (analytical 
chemistry and biological variants thereof), and often contain outliers or at 
least stragglers that I cannot simply discard. One of the things I occasionally 
do when I want to see what different assumptions do to my confidence intervals 
is to run a quick nonparametric bootstrap, just to get a feel for how 
asymmetric the distribution of any estimates might be. At the moment, I'm also 
interested in doing that on some historical data to evaluate some proposed 
estimators for interlab studies.

boot() is pretty good, but it's obvious that with such small sets, there aren't 
really many distinct resampled combinations (eg 92378 for 10 data points). So 
I'm really resampling from quite a small population of possible bootstrap 
samples. Its surely more efficient to generate all the different (resampled) 
combinations of the data set, and use those and their frequencies to get things 
like the bootstrap variance exactly. At worst, that'll stop us fooling 
ourselves into thinking more replicates will get better info.

A lengthy dig around R-help and CRAN turned up a blank on generating distinct 
combinations with resampling, so I've written a couple of routines to generate 
the distinct combinations and their frequencies. (They work, though I wouldn't 
guarantee great efficiency). But if a chemist (me) can think of it, its pretty 
certain that a statistician already has. Before I spend hours polishing code, 
is there already something out there I've missed?  

Steve Ellison



***
This email and any attachments are confidential. Any use, co...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distinct combinations for bootstrapping small sets

2007-03-06 Thread Marc Schwartz
On Tue, 2007-03-06 at 15:54 +, S Ellison wrote:
 Small data sets (6-12 values, or a similarly small number of groups)
 which don't look nice and symmetric are quite common in my field
 (analytical chemistry and biological variants thereof), and often
 contain outliers or at least stragglers that I cannot simply discard.
 One of the things I occasionally do when I want to see what different
 assumptions do to my confidence intervals is to run a quick
 nonparametric bootstrap, just to get a feel for how asymmetric the
 distribution of any estimates might be. At the moment, I'm also
 interested in doing that on some historical data to evaluate some
 proposed estimators for interlab studies.
 
 boot() is pretty good, but it's obvious that with such small sets,
 there aren't really many distinct resampled combinations (eg 92378 for
 10 data points). So I'm really resampling from quite a small
 population of possible bootstrap samples. Its surely more efficient to
 generate all the different (resampled) combinations of the data set,
 and use those and their frequencies to get things like the bootstrap
 variance exactly. At worst, that'll stop us fooling ourselves into
 thinking more replicates will get better info.
 
 A lengthy dig around R-help and CRAN turned up a blank on generating
 distinct combinations with resampling, so I've written a couple of
 routines to generate the distinct combinations and their frequencies.
 (They work, though I wouldn't guarantee great efficiency). But if a
 chemist (me) can think of it, its pretty certain that a statistician
 already has. Before I spend hours polishing code, is there already
 something out there I've missed?  
 
 Steve Ellison

Steve,

The phrase that you seem to be looking for is permutation test.

If you use the following in R:


  RSiteSearch({permutation test}, restrict = functions)


that will lead you to some of the functions available.  

One CRAN package specifically, 'coin', has a permutation framework for a
variety of such tests.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.