Re: [R] bootstrap confidence intervals with previously existing bootstrap sample
On Tue, 4 Sep 2007, [EMAIL PROTECTED] wrote: I am new to R. I would like to calculate bootstrap confidence intervals using the BCa method for a parameter of interest. My situation is this: I already have a set of 1000 bootstrap replicates created from my original data set. I have already calculated the statistic of interest for each bootstrap replicate, and have also calculated the mean for this statistic across all the replicates. Now I would like to calculate Bca confidence intervals for this statistic. Is there a way to import my previously-calculated set of 1000 statistics into R, and then calculate bootstrap confidence intervals around the mean from this imported data? I have found the code for boot.ci in the manual for the boot package, but it looks like it requires that I first use the boot function, and then apply the output to boot.ci. Because my bootstrap samples already exist, I don't want to use boot, but just want to import the 1000 values I have already calculated, and then get R to calculate the mean and Bca confidence intervals based on these values. Is this possible? Brian Ripley wrote: Yes, it is possible but you will have to study the internal structure of an object of class boot (which is documented on the help page) and mimic it. You haven't told us which type of bootstrap you used, which is one of the details you need to supply. It might be slightly easier to work with function bcanon in package bootstrap, which you would need to edit to suit your purposes. I don't know why you have picked on the BCa method: my experience is that if you need to correct the basic method you often need far more than 1000 samples to get reliable results. You can do the BCa, but you need to supply parameters: z0: typically calculated from the fraction of bootstrap statistics that are = the original statistic acceleration: based on the skewness of the empirical influence function, typically calculated using the jackknife I agree that you should do far more than 1000 samples. The BCa uses bootstrap quantiles that are adjusted based on the z0 and acceleration parameters, and estimating z0 from the bootstrap samples magnifies the Monte Carlo error. You need roughly double as many bootstrap samples as for the bootstrap percentile interval; e.g. 10^4 instead of 5000. If computational expense is an issue, you might prefer bootstrap tilting intervals, which require about 1/37 as many bootstrap samples as the BCa for comparable Monte Carlo variability. Quick overview of confidence intervals: accuracycomments t intervals 1/sqrt(n) Using either formula or bootstrap standard error; poor in the presence of skewness. bootstrap percentile1/sqrt(n) Good quick-and-dirty procedure. Partial skewness correction. Poor if the statistic is biased. bootstrap t 1/n Good coverage, but interval width can vary wildly when n is small. BCa 1/n Current best overall, but you need a lot of bootstrap samples, e.g. 10^4. tilting 1/n Low Monte Carlo variability, so can use fewer bootstrap samples. Difficult to implement, and requires that statistic can be calculated with weights. Advertisement 1: tilting is available in S+Resample, available free from www.insightful.com/downloads/libraries Advertisement 2: I talk about these more in my short course, Bootstrap Methods and Permutation Tests Oct 10-11 San Francisco, 3-4 Oct UK. http://www.insightful.com/services/training.asp | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] small sample techniques
About using t tests and confidence intervals for large samples - large may need to be very large. The old pre-computer-age rule of n = 30 is inadequate. For example, for an exponential distribution, the actual size of a nominal 2.5% one-sided t-test is not accurate to within 10% (i.e. between 2.25% 2.75%) until n is around 5000. The error (actual - nominal size) decreases very slowly, at the rate 1/sqrt(n). In practice, real distributions may be even more skewed than the exponential distribution, even though they appear less skewed, if they have long tails. In this case the sample size would need to be even larger for t procedures to be reasonably accurate. An alternative is to use bootstrapping. Bootstrap procedures that decrease at the rate 1/n include bootstrap t, BCa, and bootstrap tilting. Moshe Olshansky [EMAIL PROTECTED] wrote: If the two populations are normal the t-test gives you the exact result for whatever the sample size is (the sample size will affect the number of degrees of freedom). When the populations are not normal and the sample size is large it is still OK to use t-test (because of the Central Limit Theorem) but this is not necessarily true for the small sample size. You could use simulation to find the relevant probabilities. ... | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Short course - Bootstrap Methods and Permutation Tests Oct 10-11 San Francisco, 3-4 Oct UK. http://www.insightful.com/services/training.asp __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted least squares
As John noted, there are different kinds of weights, and different terminology: * inverse-variance weights (accuracy weights) * case weights (frequencies, counts) * sampling weights (selection probability weights) I'll add: * inverse-variance weights, where var(y for observation) = 1/weight (as opposed to just being inversely proportional to the weight) * weights used as part of an algorithm (e.g. for robust estimation, or glm's using iteratively-reweighted least-squares). For linear regression, the type of weights don't affect regression coefficient calculation, but do affect inferences such as standard errors for the regression coefficients, degrees of freedom for variance estimates, etc. lm() inferences assume the first type. Other formulae are appropriate for inferences for types 2-4. Combinations of types 1-4 require other formulae; this gets nontrivial. For the 5th type, inferences need to be handled by the algorithm that is using weighted linear regression. Tim Hesterberg John Fox wrote: I think that the problem is that the term weights has different meanings, which, although they are related, are not quite the same. The weights used by lm() are (inverse-)variance weights, reflecting the variances of the errors, with observations that have low-variance errors therefore being accorded greater weight in the resulting WLS regression. What you have are sometimes called case weights, and I'm unaware of a general way of handling them in R, although you could regenerate the unaggregated data. As you discovered, you get the same coefficients with case weights as with variance weights, but different standard errors. Finally, there are sampling weights, which are inversely proportional to the probability of selection; these are accommodated by the survey package. To complicate matters, this terminology isn't entirely standard. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort of OT: bootstrap tutorial
Another introductory document is Bootstrap Methods and Permutation Tests http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf This focuses on the methods, not on particular software. There is an accompanying library at http://www.insightful.com/Hesterberg/bootstrap/IPSdata.zip that includes the datasets. While this is for S+, R users could adapt the data/readdata.ssc file to load the datasets. The document is written for introductory statistics students, but should be interesting to those who are more advanced. For more info see http://www.insightful.com/Hesterberg/bootstrap/default.asp#Reference.introStat Patrick Burns wrote: There is now a tutorial on bootstrapping and other resampling methods at: http://www.burns-stat.com/pages/Tutor/bootstrap_resampling.html | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download S+Resample from www.insightful.com/downloads/libraries Bootstrap Methods and Permutation Tests May 21-22 Philadelphia, Oct 10-11 San Francisco. 2-3 May UK, 3-4 Oct UK. Workshop on resampling for teaching: May 16 Ohio State http://www.causeweb.org/workshop/hesterberg/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row-wise two sample T-test on subsets of a matrix
This is the kind of thing that rowMeans was made for. For the numerator of the t statistic: x1 - temp.matrix[,1:11] x2 - temp.matrix[,12:22] numerator - rowMeans(x1) - rowMeans(x2) For the denominator, if you're using S+ you can use rowVars; in R you can program a simple version quickly, e.g. rowVars - function(x){ means - rowMeans(x) rowSums((x - means)^2) / (ncol(x)-1) } Tim Hesterberg Hello all, I am trying to run a two sample t-test on a matrix which is a 196002*22 matrix. I want to run the t-test, row-wise, with the first 11 columns being a part of the first group and columns 12-22 being a part of the second group. I tried running something like (temp.matrix being my 196002*22 matrix) t.test(temp.matrix[,1:11],temp.matrix[,12:22],paired=TRUE) or somthing like as.numeric(t.test(temp.matrix[,1:11],temp.matrix[,12:22],paired=TRUE)[[1]]) so as to only capture the t-value alone and and I get a result for the whole matrix instead of a row-wise result. I want to avoid using a for loop to increment the number of rows as it would take a huge amount of time. Any suggestions would be really appreciated. thanks nameeta | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Advanced Programming in S-PLUS 30 Apr-1 May UK, 7-8 May CH, 9-10 May FR May 17-18 Philadelphia, Oct 8-9 San Francisco 26-7 Sep CH, 1-2 Oct UK http://www.insightful.com/services/training.asp __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bootstrap syntax + bootstrap for teaching
Previous subject: bootstrap bca confidence intervals for large number of statistics in one model; library(boot) Jacob Wegelin asked for an easier way to do many bootstrap confidence intervals for regression output. The syntax would be easier with S+Resample, example below. You create an ordinary lm object, then do e.g. boot - bootstrap(fit, c(coef(fit), predict(fit, newdata=NEWDATA))) limits.bca(boot) RESAMPLING FOR TEACHING I would encourage people to consider using S+Resample for teaching. There is a free student version of S+, and S+Resample is easier for students -- easier syntax (in general, not just this example), plus a menu interface. There are free chapters and packages you can use to supplement a statistics course with resampling. See: http://www.insightful.com/Hesterberg/bootstrap/#Reference.introStat I'll give a workshop on resampling for teaching at the USCOTS conference (U.S. Conference On Teaching Statistics) on May 16. http://www.causeweb.org/uscots http://www.causeweb.org/workshop/hesterberg/ set.seed(0) x - runif(150) y - 2/3 + pi * x^2 + runif(length(x))/2 plot(x,y) DAT - data.frame(x,y) NEWDATA - data.frame(x=seq(min(x), max(x), length=50)) library(resample) fit - lm(y ~ x + x^2, data=DAT) boot - bootstrap(fit, c(coef(fit), predict(fit, newdata = NEWDATA))) CIs - limits.bca(boot) lines(NEWDATA$x, CIs[4:53,1], col=red) lines(NEWDATA$x, CIs[4:53,4], col=red) CIs[1:3,] I used x + x^2 in the model rather than poly(x,2), because poly is data dependent, so regression coefficients cannot be used for new data, and confidence intervals for the coefficients are not meaningful. Comment - the default is 1000 bootstrap samples; that isn't really enough for BCa intervals, because the BCa calculations magnify the Monte Carlo standard error by roughly a factor of two. Jacob Wegelin wrote: Sometimes one might like to obtain pointwise bootstrap bias-corrected, accelerated (BCA) confidence intervals for a large number of statistics computed from a single dataset. For instance, one might like to get (so as to plot graphically) bootstrap confidence bands for the fitted values in a regression model. (Example: Chiu S et al., Early Acceleration of Head Circumference in Children with Fragile X Syndrome and Autism. Journal of Developmental Behavioral Pediatrics 2007;In press. In this paper we plot the estimated trajectories of head circumference for two different groups, computed by linear mixed-effects models, with confidence bands obtained by bootstrap.) Below is a toy example of how to do this using the boot library. We obtain BCA CIs for all three regression parameters and for the fitted values at 50 levels of the predictor. set.seed(1234567) x-runif(150) y-2/3 + pi * x^2 + runif(length(x))/2 plot(x,y) DAT-data.frame(x,y) NEWDATA-data.frame(x=seq(min(x), max(x), length=50)) library('boot') myfn-function(data, whichrows) { TheseData-data[whichrows,] thisLM-lm( y~poly(x,2), data=TheseData) thisFit-predict(thisLM, newdata=NEWDATA) c( coef(summary(thisLM))[,Estimate] , thisFit) } bootObj-boot( data=DAT, statistic=myfn, R=1000 ) names(bootObj) dim(bootObj$t) sofar-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] )) colnames(sofar)-c(lo, hi) NEWDATA-cbind(NEWDATA, sofar[4:nrow(sofar),]) thecoefs-sofar[1:3,] lines( NEWDATA$x, NEWDATA$lo, col='red') lines( NEWDATA$x, NEWDATA$hi, col='red') Note: To get boot.ci to run with type='bca' it seems necessary to have a large value of R. Otherwise boot.ci returns an error, in my (limited) experience. Thanks in advance for any critiques. (For instance, is there an easier or more elegant way?) Caveat - I helped create S+Resample. | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download S+Resample from www.insightful.com/downloads/libraries Bootstrap short courses: May 21-22 Philadelphia, Oct 10-11 San Francisco. 2-3 May UK, 3-4 Oct UK. http://www.insightful.com/services/training.asp Workshop on resampling for teaching: May 16 Ohio State http://www.causeweb.org/workshop/hesterberg/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrapping Confidence Intervals for Medians
This is in followup to an earlier message about bootstrap confidence intervals for the difference in two medians. I'll touch on three topics: * bootstrap theory for one vs two samples * special case of median * bootstrap small samples ONE VS TWO SAMPLES Brian Ripley wrote (the complete posting is below): The 'standard' theory of bootstrap confidence intervals as implemented in e.g. package 'boot' is for a single-sample problem (and it would be pushing its justification very hard to use this for n=9). But you have two samples, and haven't told us how you intend to bootstrap. I guess you mean a stratified bootstrap, sampling with replacement independently from observations 1-4 and 5-9. I don't know of theory for bootstrap confidence intervals from that scenario: do you? Much of the bootstrap theory was developed for a single sample, but carries over fairly easily to two samples or other stratified sampling. Davison and Hinkley do talk about stratified sampling in their book, and Canty implemented that in the 'boot' package. S+Resample also supports stratified sampling, with a front-end function 'bootstrap2' to make the two-sample case easier. SPECIAL CASE OF MEDIANS Beyond this, there are considerable problems with bootstrapping medians in small samples as the median is a non-smooth function of the data and the bootstrap samples take very few values. See for example the galaxies dataset as discussed in MASS4. For the stratified bootstrapping I referred to, there are only a handful of possible values of each of the medians and so the bootstrap distribution is a highly non-uniform one on a few values. E.g. Bootstrapping medians does present special difficulties. Particularly with a single sample and n odd, the bootstrap distribution is often a very inaccurate estimate for the sampling distribution; it takes on only a few values. Curiously, however, bootstrap percentile intervals for a single median are not bad. They are similar to classical nonparametric intervals based on order statistics, and use either the same or an adjacent order statistic; where they differ, the classical interval is conservative (one-sided non-coverage less than 0.025, often much less), and the bootstrap interval is one order statistic narrower, with closer to the nominal coverage. BOOTSTRAPPING SMALL VS MEDIUM SAMPLES People tend to think of bootstrapping for small samples, where they don't trust the central limit theorem, like n=9. That is misguided. They should use the bootstrap in medium size samples, where they shouldn't trust the central limit theorem, like perhaps n=20 (depending on the application) to n=1000. For very small samples the data distribution does not reliably reflect the shape of the population, and it is better to impose parametric assumptions. In other words, with very small samples you can't accurately estimate skewness, so it may be better to use classical methods that just assume that skewness is zero. Conversely, for medium size samples one should not just assume that skewness is zero, but instead use bootstrap or other methods that allow for skewness. Yes, n=1000 is a medium size sample -- the effect of skewness on confidence intervals and tests decreases very slowly, with size or coverage errors of O(1/sqrt(n)) for classical t tests and confidence intervals. One beauty of the bootstrap is that it provides a nice way to estimate what size errors you make by assuming sampling distributions are normal -- by creating a bootstrap distribution and seeing how non-normal it is. It provides a nice alternative to the pre-computer rule of using normal methods if n = 30. Tim Hesterberg | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download S+Resample from www.insightful.com/downloads/libraries Bootstrap short courses: May 21-22 Philadelphia, Oct 10-11 San Francisco. 2-3 May UK, 3-4 Oct UK. (I do talk more about these issues in the short courses.) Brian Ripley wrote: On Sat, 6 Jan 2007, [EMAIL PROTECTED] wrote: I apologize for this post. I am new to R (two days) and I have tried and tried to calculated confidence intervals for medians. Can someone help me? Later, you say you want a confidence interval for a difference in medians, not the same thing. For medians, see MASS4 section 5.7 for worked examples and discussion of the pitfalls. Here is my data: institution1 0.21 0.16 0.32 0.69 1.15 0.9 0.87 0.87 0.73 The first four observations compose group 1 and observations 5 through 9 compose group 2. I would like to create a bootstrapped 90% confidence interval on the difference
Re: [R] Time-varying correlation calculation
You can do this using an interaction of bs(time) and x, e.g.: # Time-varying coefficient x - rnorm(100) time - ppoints(100) y - sin(time) + cos(time)*x + rnorm(100)/10 library(splines) fit - lm(y ~ bs(time, df=5) * x) fit # Plot estimated intercept vs time plot(time, coef(fit)[1] + bs(time, df=5) %*% coef(fit)[2:6]) lines(time, sin(time)) # Plot estimated slope vs time plot(time, coef(fit)[7] + bs(time, df=5) %*% coef(fit)[8:12]) lines(time, cos(time)) Thanks to Trevor Hastie for suggesting this approach, when I made a similar query on S-news years ago. Tim Hesterberg I'm interested in getting a series of time-varying correlation, simply between two random variables. Could you please introduce a package to do this task? Thank you so much for any help. Amir | Tim Hesterberg Senior Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download S+Resample from www.insightful.com/downloads/libraries __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix multiplication using apply() or lappy() ?
[EMAIL PROTECTED] asked: I am trying to divide the columns of a matrix by the first row in the matrix. Dividing columns of a matrix by a vector is a pretty fundamental operation, and the query resulted in a large number of suggestions: x/matrix(v, nrow(x), ncol(x), byrow = TRUE)) sweep(x, 2, v, /) x / rep(v, each = nrow(x)) x / outer(rep(1, nrow(x)), v) x %*% diag(1/v) t(apply(x, 1, function(x) x/v)) x/rep(v, each=nrow(x)) t(apply(x, 1, /, v)) library(reshape); iapply(x, 1, /, v) # R only t(t(x)/v) scale(x, center = FALSE, v) # not previously suggested It is unsatisfactory when such a fundamental operation is done in so many different ways. * It makes it hard to read other people's code. * Some of these are very inefficient. I propose to create standard functions and possibly operator forms for this and similar operators: colPlus(x, v) x %c+% v colMinus(x, v) x %c-% v colTimes(x, v) x %c*% v colDivide(x, v) x %c/% v colPower(x, v) x %c^% v Goals are: * more readable code * generic functions, with methods for objects such as data frames and S-PLUS bigdata objects (this would be for both S-PLUS and R) * efficiency -- use the fastest of the above methods, or drop to C to avoid replicating v. * allow error checking (that length of v matches number of columns of x) I'd like feedback (to me, I'll summarize for the list) on: * the suggestion in general * are names like colPlus OK, or do you have other suggestions? * create both functions and operators, or just the functions? * should there be similar operations for rows? Note: similar operations for rows are not usually needed, because x * v # e.g. where v = colMeans(x) is equivalent to (but faster than) x * rep(v, length = length(x)) The advantage would be that colTimes(x, v) could throw an error if length(v) != nrow(x) Tim Hesterberg P.S. Of the suggestions, my preference is a / rep(v, each=nrow(a)) It was to support this and similar +-*^ operations that I originally added the each argument to rep. | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download the S+Resample library from www.insightful.com/downloads/libraries __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a way to view S-PLUS script files in R
You can open them in R. On Windows, File:Open Script, change Files of type to All Files, then open the .ssc file. Tim Hesterberg I have some S-PLUS script files (.ssc). Does there exist an R function/command that can read such files? I simply want to view the code and practice in R to help me learn the subject matter. Any help would be greatly appreciated. platform i386-pc-mingw32 ... __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] permutation test for linear models with continuous covariates
What do you want to test? To test H0: Y is independent of all X's, you can permute Y. To test H0: a particular X does not contribute to predicting Y, conditional on the other X's you have to be careful. If you permute that X, the size is wrong; the type I error can be nearly 50%, because you've lost the correlation between that X and others. I don't know of a suitable permutation test in general for testing individual X's. S+Resample has a permutationTest() function that lets you specify which columns to permute. The Canty/Davison/Hinkley boot library (in R or S-PLUS) offers permutation as one of the options for sampling in the boot() function; to use this you would specify the Y variable as the data set and pass the X's separately to the statistic. Tim Hesterberg Hi I was wondering if there is a permutation test available in R for linear models with continuous dependent covariates. I want to do a test like the one shown here. bmi-rnorm(100,25) x-c(rep(0,75),rep(1,25)) y-rnorm(100)+bmi^(1/2)+rnorm(100,2)*x+bmi*x H0-lm(y~1+x+bmi) H1-lm(y~1+x+bmi+x*bmi) anova(H0,H1) summary(lm(y~1+x+bmi)) But I want to use permutation testing to avoid an inflated p-value due to a y that is not totally normal distributed and I do not want to log transform y. Thanks Anders | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3012, U.S.A. | | www.insightful.com/Hesterberg | Download the S+Resample library from www.insightful.com/downloads/libraries Two Research Scientist positions: data mining frailty/mixed effects http://www.insightful.com/company/jobs.asp Speak out about biased science in Washington D.C. http://home.comcast.net/~timhesterberg/ScientificIntegrity.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Method for $
Ulrike Groemping [EMAIL PROTECTED] wrote: I have defined a class myclass and would like the slots to be extractable not only by @ but also by $. I now try to write a method for $ that simply executes the request [EMAIL PROTECTED], whenever someone calls object$slotname for any object of class myclass. I don't manage to find out how I can provide this function with slotname, so that one and the same function works for any arbitrary slotname a user might choose. ... I would caution against defining methods for $. In addition to Martin Maechler and Duncan Temple Lange's warnings about the danger of this, I would note that it could make R run much slower. I once tried defining a method for it in S-PLUS; that converted $ into a generic function, which slowed down every call to $, of which there are many. Even if it wouldn't slow down R, as we work to make R and S-PLUS more compatible you or someone else might try your code in S-PLUS, and cause a big speed hit. Maybe I could (and should?) have defined the class with just one slot that contains the list, which would make it behave like I want it immediately. Why not make it a list with an S3 class, rather than an S4 class? Tim Hesterberg | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3012, U.S.A. | | www.insightful.com/Hesterberg | Download the S+Resample library from www.insightful.com/downloads/libraries Two Research Scientist positions: data mining frailty/mixed effects http://www.insightful.com/company/jobs.asp Speak out about biased science in Washington D.C. http://home.comcast.net/~timhesterberg/ScientificIntegrity.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stratified Bootstrap question
Qian wrote: I talked with my advisor yesterday about how to do bootstrapping for my scenario: random clinic + random subject within clinic. She suggested that only clinic are independent units, so I can only resample clinic. But I think that since subjects are also independent within clinic, shall I resample subjects within clinic, which means I have two-stage resampling? Which one do you think makes sense? This is a tough issue; I don't have a complete answer. I'd appreciate input from other r-help readers. If you randomly select clinics, then randomly select patients within the clinics: (1) by bootstrapping just clinics, you capture both sources of variation -- the between-subject variation is incorporated in the results for each clinic. (2) by bootstrapping clinics, then subjects within clinics, you end up double-counting the between-subject variation That argues for resampling just clinics. By analogy, if you have multiple subjects, and multiple measurements per subject, you should just resample subjects. However, I'm not comfortable with this if you have a small number of clinics, and relatively large numbers of patients in each clinic, and think that the between-clinic variation should be small. Then it seems better to resample both clinics and patients. I'm leery about resampling just clinics if there are a small number of clinics. Bootstrapping isn't particularly effective for small samples -- it is subject to skewness in small samples, and it underestimates variances (it's advantages over classical methods really show up with medium size samples). There are remedies for the small variance, see Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930 www.insightful.com/Hesterberg/articles/JSM04-bootknife.pdf Tim Hesterberg | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download the S+Resample library from www.insightful.com/downloads/libraries __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Stratified Bootstrap question
Dear Qian, Yes, when bootstrap sampling by subject, when a subject is included in a bootstrap dataset multiple times, you must give that subject different IDs, or some statistics would be incorrect. Here's what S+Resample does (from help(bootstrap.args)): If subject is the name of a variable in the data frame (for example data=Orthodont, subject=Subject), then bootstrap makes resampled subjects unique; that is, duplicated subjects in a given resample are assigned distinct subject values in the resampled data frame before the statistic is evaluated; this is useful for longitudinal and other modeling where the statistic expects subjects to have unique values. Tim Hesterberg Dear Tim, Thank you so much for your help. My random mixed model is as follows: b.lme - lme(sbp ~ age + gender, data=bdat, random=~1/clinic/id, na.action=na.omit) When doing bootstrap with stratum clinic, a patient's data may appear multiple times in the boostrap dataset and all of them share the same id. I am wondering if the data from the same patient will cause problems in lme fitting or not. Do you happen to know this or not? Sincerely yours, Qian | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download the S+Resample library from www.insightful.com/downloads/libraries __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Stratified Bootstrap question
Dear Qian, You might try the S+Resample library, which has built-in support for both sampling by subject and stratified sampling. If you are a student, there is a free student version of S+. See www.insightful.com/downloads/libraries (S+Resample) www.insightful.com/Hesterberg/bootstrap (has link to the student version) For the missing values, consider the S+Missing library, which offers multiple imputation. With S+, do library(missing) Tim Hesterberg P.S. The combination of sampling by subject and stratified sampling was terribly messy to program. If I'd known in advance how messy, I never would have done it :-( But it is done now. Dear R users, I have a question regarding stratified bootstrap question and how to implement it using boot() in R's boot package. My dataset is a longitudinal dataset (3 measurements per person at year 1, 4 and 7) composed of multiple clinic centers and multiple participants within each clinic. It has missing values. I want to do a bootstrap to find the standard errors and confidence intervals for my variance components. My model is a mixed model with random clinic and random participant within clinic. I thought two methods to do bootstrap: (1) bootstrap data; however, I have problem specifying the second parameter for my statistic function, shall I use indices, weight or frequency and how shall I relate to my dataset. (2) bootstrap residuals; however, the dataset has multiple measurements and missing values. I am wondering how to construct a new data frame containing the residuals and fitted values. Any ideas will be highly appreciated! Sincerely yours, Qian | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | Download the S+Resample library from www.insightful.com/downloads/libraries __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: [S] January advanced R/Splus course in Boston?
I learnt there's an advanced R/Splus course in Boston this january. Anyone got the announcement? please kindly forward it to me. I'm giving an Advanced Programming in S-PLUS course in Boston and San Francisco in March. See: http://www.insightful.com/services/training.asp http://www.insightful.com/services/schedule.asp http://www.insightful.com/services/course.asp?CID=37 I'm also giving a Bootstrap Methods and Permutation Tests course, same places. Tim Hesterberg | Tim Hesterberg Research Scientist | | [EMAIL PROTECTED] Insightful Corp.| | (206)802-23191700 Westlake Ave. N, Suite 500 | | (206)802-2500 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html