Re: [R] bootstrap confidence intervals with previously existing bootstrap sample

2007-09-05 Thread Tim Hesterberg
On Tue, 4 Sep 2007, [EMAIL PROTECTED] wrote:
 I am new to R. I would like to calculate bootstrap confidence intervals
 using the BCa method for a parameter of interest. My situation is this: I
 already have a set of 1000 bootstrap replicates created from my original
 data set. I have already calculated the statistic of interest for each
 bootstrap replicate, and have also calculated the mean for this statistic
 across all the replicates. Now I would like to calculate Bca confidence
 intervals for this statistic. Is there a way to import my
 previously-calculated set of 1000 statistics into R, and then calculate
 bootstrap confidence intervals around the mean from this imported data?

 I have found the code for boot.ci in the manual for the boot package, but
 it looks like it requires that I first use the boot function, and then
 apply the output to boot.ci. Because my bootstrap samples already exist,
 I don't want to use boot, but just want to import the 1000 values I have
 already calculated, and then get R to calculate the mean and Bca confidence
 intervals based on these values. Is this possible?

Brian Ripley wrote:
Yes, it is possible but you will have to study the internal structure of 
an object of class boot (which is documented on the help page) and mimic 
it.  You haven't told us which type of bootstrap you used, which is one of 
the details you need to supply.

It might be slightly easier to work with function bcanon in package 
bootstrap, which you would need to edit to suit your purposes.

I don't know why you have picked on the BCa method: my experience is that 
if you need to correct the basic method you often need far more than 1000 
samples to get reliable results.

You can do the BCa, but you need to supply parameters:
z0: typically calculated from the fraction of bootstrap statistics 
that are = the original statistic
acceleration: based on the skewness of the empirical influence function,
typically calculated using the jackknife

I agree that you should do far more than 1000 samples.  The BCa uses
bootstrap quantiles that are adjusted based on the z0 and acceleration
parameters, and estimating z0 from the bootstrap samples magnifies
the Monte Carlo error.  You need roughly double as many bootstrap samples
as for the bootstrap percentile interval; e.g. 10^4 instead of 5000.

If computational expense is an issue, you might prefer bootstrap
tilting intervals, which require about 1/37 as many bootstrap samples
as the BCa for comparable Monte Carlo variability.

Quick overview of confidence intervals:

accuracycomments
t intervals 1/sqrt(n)   Using either formula or bootstrap 
standard error; poor in the presence
of skewness.
bootstrap percentile1/sqrt(n)   Good quick-and-dirty procedure.
Partial skewness correction.
Poor if the statistic is biased.
bootstrap t 1/n Good coverage, but interval
width can vary wildly when n is small.
BCa 1/n Current best overall, but you need
a lot of bootstrap samples, e.g. 10^4.
tilting 1/n Low Monte Carlo variability, so can 
use fewer bootstrap samples.  
Difficult to implement, and
requires that statistic can be
calculated with weights.

Advertisement 1: tilting is available in S+Resample, available free
from www.insightful.com/downloads/libraries

Advertisement 2: I talk about these more in my short course,
Bootstrap Methods and Permutation Tests
Oct 10-11 San Francisco, 3-4 Oct UK.
http://www.insightful.com/services/training.asp


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] small sample techniques

2007-08-08 Thread Tim Hesterberg
About using t tests and confidence intervals for large samples -
large may need to be very large.
The old pre-computer-age rule of n = 30 is inadequate.

For example, for an exponential distribution, the actual size
of a nominal 2.5% one-sided t-test is not accurate to within 10%
(i.e. between 2.25%  2.75%) until n is around 5000.
The error (actual - nominal size) decreases very slowly, at the rate 1/sqrt(n).

In practice, real distributions may be even more skewed than
the exponential distribution, even though they appear less skewed,
if they have long tails.  In this case the sample size would need
to be even larger for t procedures to be reasonably accurate.

An alternative is to use bootstrapping.  Bootstrap procedures that
decrease at the rate 1/n include bootstrap t, BCa, and bootstrap
tilting.

Moshe Olshansky [EMAIL PROTECTED] wrote:
If the two populations are normal the t-test gives you
the exact result for whatever the sample size is (the
sample size will affect the number of degrees of
freedom).
When the populations are not normal and the sample
size is large it is still OK to use t-test (because of
the Central Limit Theorem) but this is not necessarily
true for the small sample size.
You could use simulation to find the relevant
probabilities.
...


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Short course - Bootstrap Methods and Permutation Tests
Oct 10-11 San Francisco, 3-4 Oct UK.
http://www.insightful.com/services/training.asp

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted least squares

2007-06-11 Thread Tim Hesterberg
As John noted, there are different kinds of weights, and 
different terminology:
* inverse-variance weights (accuracy weights)
* case weights (frequencies, counts)
* sampling weights (selection probability weights)

I'll add:
* inverse-variance weights, where var(y for observation) = 1/weight
  (as opposed to just being inversely proportional to the weight)
* weights used as part of an algorithm (e.g. for robust estimation,
  or glm's using iteratively-reweighted least-squares).

For linear regression, the type of weights don't affect regression
coefficient calculation, but do affect inferences such as standard errors
for the regression coefficients, degrees of freedom for variance
estimates, etc.  

lm() inferences assume the first type.  
Other formulae are appropriate for inferences for types 2-4.
Combinations of types 1-4 require other formulae; this gets nontrivial.
For the 5th type, inferences need to be handled by the algorithm that
is using weighted linear regression.

Tim Hesterberg

John Fox wrote:
I think that the problem is that the term weights has different meanings,
which, although they are related, are not quite the same. 

The weights used by lm() are (inverse-)variance weights, reflecting the
variances of the errors, with observations that have low-variance errors
therefore being accorded greater weight in the resulting WLS regression.
What you have are sometimes called case weights, and I'm unaware of a
general way of handling them in R, although you could regenerate the
unaggregated data. As you discovered, you get the same coefficients with
case weights as with variance weights, but different standard errors.
Finally, there are sampling weights, which are inversely proportional to
the probability of selection; these are accommodated by the survey package. 

To complicate matters, this terminology isn't entirely standard.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort of OT: bootstrap tutorial

2007-03-12 Thread Tim Hesterberg
Another introductory document is
Bootstrap Methods and Permutation Tests
http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf
This focuses on the methods, not on particular software.

There is an accompanying library at
http://www.insightful.com/Hesterberg/bootstrap/IPSdata.zip
that includes the datasets.  While this is for S+, R users could
adapt the data/readdata.ssc file to load the datasets.

The document is written for introductory statistics students, but
should be interesting to those who are more advanced.  For more info see
http://www.insightful.com/Hesterberg/bootstrap/default.asp#Reference.introStat

Patrick Burns wrote:
There is now a tutorial on bootstrapping and other resampling
methods at:

http://www.burns-stat.com/pages/Tutor/bootstrap_resampling.html


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download S+Resample from www.insightful.com/downloads/libraries

Bootstrap Methods and Permutation Tests
May 21-22 Philadelphia, Oct 10-11 San Francisco.
2-3 May UK, 3-4 Oct UK.
Workshop on resampling for teaching:
May 16 Ohio State 
http://www.causeweb.org/workshop/hesterberg/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Row-wise two sample T-test on subsets of a matrix

2007-03-12 Thread Tim Hesterberg
This is the kind of thing that rowMeans was made for.
For the numerator of the t statistic:

x1 - temp.matrix[,1:11]
x2 - temp.matrix[,12:22]
numerator - rowMeans(x1) - rowMeans(x2)

For the denominator, if you're using S+ you can use rowVars;
in R you can program a simple version quickly, e.g.

rowVars - function(x){
  means - rowMeans(x)
  rowSums((x - means)^2) / (ncol(x)-1)
}

Tim Hesterberg

Hello all,

I am trying to run a two sample t-test on a matrix which is a
196002*22 matrix. I want to run the t-test, row-wise, with the
first 11 columns being a part of the first group and columns
12-22 being a part of the second group. 

I tried running something like (temp.matrix being my 196002*22
matrix)

t.test(temp.matrix[,1:11],temp.matrix[,12:22],paired=TRUE)

or somthing like

as.numeric(t.test(temp.matrix[,1:11],temp.matrix[,12:22],paired=TRUE)[[1]])
so as to only capture the t-value alone and 

and I get a result for the whole matrix instead of a row-wise
result. 

I want to avoid using a for loop to increment the number of
rows as it would take a huge amount of time.


Any suggestions would be really appreciated.

thanks
nameeta


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Advanced Programming in S-PLUS
30 Apr-1 May UK, 7-8 May CH, 9-10 May FR
May 17-18 Philadelphia, Oct 8-9 San Francisco
26-7 Sep CH, 1-2 Oct UK 
http://www.insightful.com/services/training.asp

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] bootstrap syntax + bootstrap for teaching

2007-01-31 Thread Tim Hesterberg
Previous subject:
bootstrap bca confidence intervals for large number of statistics in one model; 
library(boot)

Jacob Wegelin asked for an easier way to do many bootstrap confidence
intervals for regression output.
The syntax would be easier with S+Resample, example below.
You create an ordinary lm object, then do e.g.
boot - bootstrap(fit, c(coef(fit), predict(fit, newdata=NEWDATA)))
limits.bca(boot)


 RESAMPLING FOR TEACHING 

I would encourage people to consider using S+Resample for teaching.
There is a free student version of S+, and S+Resample is easier for
students -- easier syntax (in general, not just this example), plus a
menu interface.  There are free chapters and packages you can use to
supplement a statistics course with resampling.  See:
http://www.insightful.com/Hesterberg/bootstrap/#Reference.introStat

I'll give a workshop on resampling for teaching at the USCOTS conference
(U.S. Conference On Teaching Statistics) on May 16.
http://www.causeweb.org/uscots
http://www.causeweb.org/workshop/hesterberg/


set.seed(0)
x - runif(150)
y - 2/3 + pi * x^2 + runif(length(x))/2
plot(x,y)
DAT - data.frame(x,y)
NEWDATA - data.frame(x=seq(min(x), max(x), length=50))
library(resample)
fit - lm(y ~ x + x^2, data=DAT)
boot - bootstrap(fit, c(coef(fit), predict(fit, newdata = NEWDATA)))
CIs - limits.bca(boot)
lines(NEWDATA$x, CIs[4:53,1], col=red)
lines(NEWDATA$x, CIs[4:53,4], col=red)
CIs[1:3,]

I used x + x^2 in the model rather than poly(x,2), because poly is
data dependent, so regression coefficients cannot be used for new
data, and confidence intervals for the coefficients are not meaningful.

Comment - the default is 1000 bootstrap samples; that isn't really
enough for BCa intervals, because the BCa calculations magnify the
Monte Carlo standard error by roughly a factor of two.

Jacob Wegelin wrote:
Sometimes one might like to obtain pointwise bootstrap bias-corrected,
accelerated (BCA) confidence intervals for a large number of statistics
computed from a single dataset.  For instance, one might like to get
(so as to plot graphically) bootstrap confidence bands for the fitted
values in a regression model.

(Example: Chiu S et al., Early Acceleration of Head Circumference in
Children with Fragile X Syndrome and Autism. Journal of Developmental 
Behavioral Pediatrics 2007;In press. In this paper we plot the
estimated trajectories of head circumference for two different
groups, computed by linear mixed-effects models, with confidence bands
obtained by bootstrap.)

Below is a toy example of how to do this using the boot library.
We obtain BCA CIs for all three regression parameters and for the fitted
values at 50 levels of the predictor.

set.seed(1234567)
x-runif(150)
y-2/3 + pi * x^2 + runif(length(x))/2
plot(x,y)
DAT-data.frame(x,y)
NEWDATA-data.frame(x=seq(min(x), max(x), length=50))
library('boot')
myfn-function(data, whichrows) {
   TheseData-data[whichrows,]
   thisLM-lm( y~poly(x,2), data=TheseData)
   thisFit-predict(thisLM, newdata=NEWDATA)
   c(
   coef(summary(thisLM))[,Estimate]
   , thisFit)
}
bootObj-boot( data=DAT, statistic=myfn, R=1000 )
names(bootObj)
dim(bootObj$t)
sofar-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, 
type='bca', index=thiscolumn)$bca[4:5] ))
colnames(sofar)-c(lo, hi)
NEWDATA-cbind(NEWDATA, sofar[4:nrow(sofar),])
thecoefs-sofar[1:3,]
lines( NEWDATA$x, NEWDATA$lo, col='red')
lines( NEWDATA$x, NEWDATA$hi, col='red')

Note: To get boot.ci to run with type='bca' it seems necessary to have a
large value of R.  Otherwise boot.ci returns an error, in my (limited)
experience.

Thanks in advance for any critiques.  (For instance, is there an easier or 
more elegant way?)

Caveat - I helped create S+Resample.


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download S+Resample from www.insightful.com/downloads/libraries

Bootstrap short courses:  May 21-22 Philadelphia, Oct 10-11 San Francisco.
  2-3 May UK, 3-4 Oct UK.
http://www.insightful.com/services/training.asp

Workshop on resampling for teaching:  May 16 Ohio State 
http://www.causeweb.org/workshop/hesterberg/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrapping Confidence Intervals for Medians

2007-01-30 Thread Tim Hesterberg
This is in followup to an earlier message about bootstrap confidence
intervals for the difference in two medians.  I'll touch on three
topics:
* bootstrap theory for one vs two samples
* special case of median
* bootstrap small samples


ONE VS TWO SAMPLES

Brian Ripley wrote (the complete posting is below):
The 'standard' theory of bootstrap confidence intervals as implemented in 
e.g. package 'boot' is for a single-sample problem (and it would be 
pushing its justification very hard to use this for n=9).  But you have 
two samples, and haven't told us how you intend to bootstrap.  I guess you 
mean a stratified bootstrap, sampling with replacement independently from 
observations 1-4 and 5-9.  I don't know of theory for bootstrap confidence 
intervals from that scenario: do you?

Much of the bootstrap theory was developed for a single sample,
but carries over fairly easily to two samples or other stratified
sampling.  Davison and Hinkley do talk about stratified sampling
in their book, and Canty implemented that in the 'boot' package.
S+Resample also supports stratified sampling, with a front-end
function 'bootstrap2' to make the two-sample case easier.


SPECIAL CASE OF MEDIANS

Beyond this, there are considerable problems with bootstrapping medians in 
small samples as the median is a non-smooth function of the data and the 
bootstrap samples take very few values.  See for example the galaxies 
dataset as discussed in MASS4.  For the stratified bootstrapping I 
referred to, there are only a handful of possible values of each of the 
medians and so the bootstrap distribution is a highly non-uniform one on a 
few values.  E.g.

Bootstrapping medians does present special difficulties.  Particularly
with a single sample and n odd, the bootstrap distribution is often a
very inaccurate estimate for the sampling distribution; it takes on
only a few values.

Curiously, however, bootstrap percentile intervals for a single median
are not bad.  They are similar to classical nonparametric intervals
based on order statistics, and use either the same or an adjacent
order statistic; where they differ, the classical interval is
conservative (one-sided non-coverage less than 0.025, often much
less), and the bootstrap interval is one order statistic narrower, with
closer to the nominal coverage.


BOOTSTRAPPING SMALL VS MEDIUM SAMPLES

People tend to think of bootstrapping for small samples, where they
don't trust the central limit theorem, like n=9.  That is misguided.
They should use the bootstrap in medium size samples, where
they shouldn't trust the central limit theorem, like perhaps n=20
(depending on the application) to n=1000.  

For very small samples the data distribution does not reliably reflect
the shape of the population, and it is better to impose parametric
assumptions.  In other words, with very small samples you can't accurately
estimate skewness, so it may be better to use classical methods that just
assume that skewness is zero.

Conversely, for medium size samples one should not just assume that
skewness is zero, but instead use bootstrap or other methods that
allow for skewness.  Yes, n=1000 is a medium size sample -- the effect
of skewness on confidence intervals and tests decreases very slowly,
with size or coverage errors of O(1/sqrt(n)) for classical t tests and
confidence intervals.

One beauty of the bootstrap is that it provides a nice way to estimate
what size errors you make by assuming sampling distributions are normal --
by creating a bootstrap distribution and seeing how non-normal it is.
It provides a nice alternative to the pre-computer rule of using
normal methods if n = 30.

Tim Hesterberg


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download S+Resample from www.insightful.com/downloads/libraries
Bootstrap short courses:  May 21-22 Philadelphia, Oct 10-11 San Francisco.
  2-3 May UK, 3-4 Oct UK.
(I do talk more about these issues in the short courses.)

Brian Ripley wrote:
On Sat, 6 Jan 2007, [EMAIL PROTECTED] wrote:

 I apologize for this post. I am new to R (two days) and I have tried and 
 tried to calculated confidence intervals for medians. Can someone help 
 me?

Later, you say you want a confidence interval for a difference in medians, 
not the same thing.

For medians, see MASS4 section 5.7 for worked examples and discussion of 
the pitfalls.

 Here is my data:

 institution1
 0.21
 0.16
 0.32
 0.69
 1.15
 0.9
 0.87
 0.87
 0.73

 The first four observations compose group 1 and observations 5 through 9 
 compose group 2. I would like to create a bootstrapped 90% confidence 
 interval on the difference

Re: [R] Time-varying correlation calculation

2007-01-23 Thread Tim Hesterberg
You can do this using an interaction of bs(time) and x, e.g.:

# Time-varying coefficient
x - rnorm(100)
time - ppoints(100)
y - sin(time) + cos(time)*x + rnorm(100)/10

library(splines)
fit - lm(y ~ bs(time, df=5) * x)
fit
# Plot estimated intercept vs time
plot(time, coef(fit)[1] + bs(time, df=5) %*% coef(fit)[2:6])
lines(time, sin(time))
# Plot estimated slope vs time
plot(time, coef(fit)[7] + bs(time, df=5) %*% coef(fit)[8:12])
lines(time, cos(time))

Thanks to Trevor Hastie for suggesting this approach, when
I made a similar query on S-news years ago.

Tim Hesterberg

  I'm interested in getting a series of time-varying correlation, simply 
 between two random variables.
   
  Could you please introduce a package to do this task?
   
  Thank you so much for any help.
  Amir 


| Tim Hesterberg   Senior Research Scientist   |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download S+Resample from www.insightful.com/downloads/libraries

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix multiplication using apply() or lappy() ?

2006-09-07 Thread Tim Hesterberg
[EMAIL PROTECTED] asked:
I am trying to divide the columns of a matrix by the first row in the 
matrix.

Dividing columns of a matrix by a vector is a pretty fundamental
operation, and the query resulted in a large number of suggestions:

x/matrix(v, nrow(x), ncol(x), byrow = TRUE))
sweep(x, 2, v, /)
x / rep(v, each = nrow(x))
x / outer(rep(1, nrow(x)), v)
x %*% diag(1/v)
t(apply(x, 1, function(x) x/v))
x/rep(v, each=nrow(x))
t(apply(x, 1, /, v))
library(reshape); iapply(x, 1, /, v)  # R only
t(t(x)/v)
scale(x, center = FALSE, v)  # not previously suggested


It is unsatisfactory when such a fundamental operation is
done in so many different ways.  
* It makes it hard to read other people's code.  
* Some of these are very inefficient.

I propose to create standard functions and possibly operator forms
for this and similar operators:

colPlus(x, v)   x %c+% v
colMinus(x, v)  x %c-% v
colTimes(x, v)  x %c*% v
colDivide(x, v) x %c/% v
colPower(x, v)  x %c^% v

Goals are:
* more readable code
* generic functions, with methods for objects such as data frames
  and S-PLUS bigdata objects  (this would be for both S-PLUS and R)
* efficiency -- use the fastest of the above methods, or drop to C
  to avoid replicating v.
* allow error checking (that length of v matches number of columns of x)

I'd like feedback (to me, I'll summarize for the list) on:
* the suggestion in general
* are names like colPlus OK, or do you have other suggestions?
* create both functions and operators, or just the functions?
* should there be similar operations for rows?  

Note:  similar operations for rows are not usually needed, because
x * v  # e.g. where v = colMeans(x)
is equivalent to (but faster than)
x * rep(v, length = length(x))
The advantage would be that
colTimes(x, v)
could throw an error if length(v) != nrow(x)

Tim Hesterberg

P.S.  Of the suggestions, my preference is
a / rep(v, each=nrow(a))
It was to support this and similar +-*^ operations that I originally
added the each argument to rep.


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download the S+Resample library from www.insightful.com/downloads/libraries

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a way to view S-PLUS script files in R

2006-03-17 Thread Tim Hesterberg
You can open them in R.  On Windows, File:Open Script,
change Files of type to All Files, then open the .ssc file.

Tim Hesterberg

I have some S-PLUS script files (.ssc).  Does there exist an R
function/command  that can read such files?  I simply want to view the
code and practice in R to help me learn the subject matter.

Any help would be greatly appreciated.

platform i386-pc-mingw32
...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] permutation test for linear models with continuous covariates

2005-12-12 Thread Tim Hesterberg
What do you want to test?

To test H0:  Y is independent of all X's, 
you can permute Y.

To test H0:  a particular X does not contribute to predicting Y, conditional
on the other X's
you have to be careful.  If you permute that X, the size is wrong; the
type I error can be nearly 50%, because you've lost the correlation
between that X and others.  I don't know of a suitable permutation test
in general for testing individual X's.

S+Resample has a permutationTest() function that lets you specify
which columns to permute.  The Canty/Davison/Hinkley boot library
(in R or S-PLUS) offers permutation as one of the options for sampling
in the boot() function; to use this you would specify the Y variable
as the data set and pass the X's separately to the statistic.

Tim Hesterberg

Hi I was wondering if there is a permutation test available in R for linear 
models with continuous dependent covariates. I want to do a test like the 
one shown here.

bmi-rnorm(100,25)
x-c(rep(0,75),rep(1,25))
y-rnorm(100)+bmi^(1/2)+rnorm(100,2)*x+bmi*x

H0-lm(y~1+x+bmi)
H1-lm(y~1+x+bmi+x*bmi)
anova(H0,H1)
summary(lm(y~1+x+bmi))


But I want to use permutation testing to avoid an inflated p-value due to a 
y that is not totally normal distributed and I do not want to log transform 
y.

Thanks

Anders


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3012, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download the S+Resample library from www.insightful.com/downloads/libraries

Two Research Scientist positions:
data mining
frailty/mixed effects
http://www.insightful.com/company/jobs.asp

Speak out about biased science in Washington D.C.
http://home.comcast.net/~timhesterberg/ScientificIntegrity.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Method for $

2005-11-21 Thread Tim Hesterberg
Ulrike Groemping [EMAIL PROTECTED] wrote:

I have defined a class myclass and would like the slots to be extractable 
not only by @ but also by $. I now try to write a method for $ that 
simply executes the request [EMAIL PROTECTED], whenever someone calls 
object$slotname for any object of class myclass.
I don't manage to find out how I can provide this function with slotname, 
so that one and the same function works for any arbitrary slotname a user 
might choose.
...

I would caution against defining methods for $.

In addition to Martin Maechler and Duncan Temple Lange's warnings
about the danger of this, I would note that it could make R run much
slower.  I once tried defining a method for it in S-PLUS; that
converted $ into a generic function, which slowed down every call to $,
of which there are many.

Even if it wouldn't slow down R, as we work to make R and S-PLUS more
compatible you or someone else might try your code in S-PLUS, and
cause a big speed hit.

Maybe I could (and should?) have defined the class with just one slot
that contains the list, which would make it behave like I want it
immediately.

Why not make it a list with an S3 class, rather than an S4 class?

Tim Hesterberg


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3012, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download the S+Resample library from www.insightful.com/downloads/libraries

Two Research Scientist positions:
data mining
frailty/mixed effects
http://www.insightful.com/company/jobs.asp

Speak out about biased science in Washington D.C.
http://home.comcast.net/~timhesterberg/ScientificIntegrity.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Stratified Bootstrap question

2005-04-01 Thread Tim Hesterberg
Qian wrote:
I talked with my advisor yesterday about how to do bootstrapping for my
scenario: random clinic + random subject within clinic. She suggested that
only clinic are independent units, so I can only resample clinic. But I
think that since subjects are also independent within clinic, shall I
resample subjects within clinic, which means I have two-stage resampling?
Which one do you think makes sense?

This is a tough issue; I don't have a complete answer.  I'd
appreciate input from other r-help readers.

If you randomly select clinics, then randomly select patients within
the clinics:
  (1) by bootstrapping just clinics, you capture both sources of
  variation -- the between-subject variation is incorporated in the
  results for each clinic.
  
  (2) by bootstrapping clinics, then subjects within clinics, you
  end up double-counting the between-subject variation
That argues for resampling just clinics.

By analogy, if you have multiple subjects, and multiple measurements
per subject, you should just resample subjects.

However, I'm not comfortable with this if you have a small number of
clinics, and relatively large numbers of patients in each clinic, and
think that the between-clinic variation should be small.  Then it
seems better to resample both clinics and patients.

I'm leery about resampling just clinics if there are a small number
of clinics.  Bootstrapping isn't particularly effective for small
samples -- it is subject to skewness in small samples, and it 
underestimates variances (it's advantages over classical methods
really show up with medium size samples).
There are remedies for the small variance, see
Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling
vs. Smoothing, Proceedings of the Section on Statistics and the
Environment, American Statistical Association, 2924-2930
www.insightful.com/Hesterberg/articles/JSM04-bootknife.pdf

Tim Hesterberg


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download the S+Resample library from www.insightful.com/downloads/libraries

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Stratified Bootstrap question

2005-03-31 Thread Tim Hesterberg
Dear Qian,

Yes, when bootstrap sampling by subject, when a subject is included
in a bootstrap dataset multiple times, you must give that subject
different IDs, or some statistics would be incorrect.

Here's what S+Resample does (from help(bootstrap.args)):
 If subject is the name of a variable in the data frame (for
 example data=Orthodont, subject=Subject), then bootstrap makes
 resampled subjects unique; that is, duplicated subjects in a
 given resample are assigned distinct subject values in the
 resampled data frame before the statistic is evaluated; this is
 useful for longitudinal and other modeling where the statistic
 expects subjects to have unique values.

Tim Hesterberg


Dear Tim,

Thank you so much for your help. My random mixed model is as follows:

b.lme - lme(sbp ~ age + gender, data=bdat, random=~1/clinic/id,
 na.action=na.omit)

When doing bootstrap with stratum clinic, a patient's data may appear
multiple times in the boostrap dataset and all of them share the same id.
I am wondering if the data from the same patient will cause problems in
lme fitting or not. Do you happen to know this or not?

Sincerely yours,
Qian


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download the S+Resample library from www.insightful.com/downloads/libraries

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Stratified Bootstrap question

2005-03-30 Thread Tim Hesterberg
Dear Qian,

You might try the S+Resample library, which has built-in support
for both sampling by subject and stratified sampling.

If you are a student, there is a free student version of S+.

See
www.insightful.com/downloads/libraries  (S+Resample)
www.insightful.com/Hesterberg/bootstrap (has link to the student version)

For the missing values, consider the S+Missing library,
which offers multiple imputation.  With S+, do
library(missing)

Tim Hesterberg

P.S.  The combination of sampling by subject and stratified sampling
was terribly messy to program.  If I'd known in advance how messy, I
never would have done it :-(  But it is done now.

Dear R users,

I have a question regarding stratified bootstrap question and how to implement
it using boot() in R's boot package.

My dataset is a longitudinal dataset (3 measurements per person at year
1, 4 and 7) composed of multiple clinic centers and multiple participants
within each clinic. It has missing values.

I want to do a bootstrap to find the standard errors and confidence
intervals for my variance components. My model is a mixed model with
random clinic and random participant within clinic.

I thought two methods to do bootstrap:
(1) bootstrap data; however, I have problem specifying the second
parameter for my statistic function, shall I use indices, weight or
frequency and how shall I relate to my dataset.
(2) bootstrap residuals; however, the dataset has multiple measurements
and missing values. I am wondering how to construct a new data frame
containing the residuals and fitted values.

Any ideas will be highly appreciated!
Sincerely yours,
Qian


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)283-8691 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

Download the S+Resample library from www.insightful.com/downloads/libraries

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: [S] January advanced R/Splus course in Boston?

2004-01-19 Thread Tim Hesterberg
I learnt there's an advanced R/Splus course in Boston
this january. Anyone got the announcement? please
kindly forward it to me.

I'm giving an Advanced Programming in S-PLUS course in Boston and San
Francisco in March.  See:
http://www.insightful.com/services/training.asp
http://www.insightful.com/services/schedule.asp
http://www.insightful.com/services/course.asp?CID=37

I'm also giving a Bootstrap Methods and Permutation Tests course,
same places.

Tim Hesterberg


| Tim Hesterberg   Research Scientist  |
| [EMAIL PROTECTED]  Insightful Corp.|
| (206)802-23191700 Westlake Ave. N, Suite 500 |
| (206)802-2500 (fax)  Seattle, WA 98109-3044, U.S.A.  |
|  www.insightful.com/Hesterberg   |

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html