Thank you all for your very insightful comments! And thank you for the directions to the packages!
Re: non-statistical issues, yes, I was looking through Altman and Royston's Stat Med 2000 article "What do we mean by validating a prognostic model?" last night and it was very interesting. I'm working on expression profiling in tumor samples, and there are several difficulties in designing an experiment along those 'ideal' guidelines. A. Small sample size is of course the most common recurring problem, with concomitant even lower event rate. B. Patient recruitment issues are yet another issue, as many of the samples are degraded after time - the older the sample, the more degraded it usually is! So that biases selection towards recent cases. Different centres have different storage techniques, resulting in extensive degradation in samples from 1 centre and relatively intact samples from others. So there is no choice but to perform "data-driven selection" of cases - i.e. only samples which have good RNA. Other problems I've encountered include: C. Computational time. using a training sample size of 50 arrays, running a full internal cross-validation of a model derived using pamr.cv.cox took my computer about one and a half hours (with no other process running). (P4, 3 GHz, 2 GB RAM, R 1.9.1., Windows XP) And that's just *one* randomization! Min-Han On Tue, 28 Sep 2004 10:55:50 -0700, Berton Gunter <[EMAIL PROTECTED]> wrote: > > But note that there may be deeper, non-statistical, issues of what you mean > by "validation" here: how good must the predictions be on the validation > data? How similar or dissimilar should the validation data be to the > "training" data? To what end/population is the fitted model to be applied? > For example, AFAIK in most scientific research, a model is not considered > "validated" unless results can be substantively reproduced (??) in different > labs, sometimes with alternative methods. > > Think of the 1916 (I think it was) measurements of star positions during a > total solar eclipse to "validate" Einstein's Theory of General Relativity. > My point is not to say that this kind of "validation" is appropriate for a > Cox model, but only that the issues are worth thinking about. > > -- Bert Gunter > Genentech Non-Clinical Statistics > South San Francisco, CA > > "The business of the statistician is to catalyze the scientific learning > process." - George E. P. Box > > > > > > -----Original Message----- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Frank > > E Harrell Jr > > Sent: Tuesday, September 28, 2004 10:11 AM > > To: Min-Han Tan > > Cc: [EMAIL PROTECTED] > > Subject: Re: [R] Validating a Cox model on an external set > > > > Min-Han Tan wrote: > > > Good morning, > > > > > > Sorry to trouble the list. > > > > > > I have a problem I hope to seek your advice on. > > > > > > Essentially, I am trying to 'validate' a multivariate Cox > > proportional > > > hazards model built in a training set, by testing it on an external > > > test set. I have performed a survfit using the Cox model to predict > > > survival for the test set, and obtained individual predictions for > > > survival time, with standard error for each test sample. > > Each of these > > > cases has an actual survival time, some censored. > > > > > > How can we decide whether the Cox model has been validated or not? > > > > This is what the Design package and its cph and validate.cph and > > calibrate.cph functions are for. > > > > > > > > I was suggested survdiff in the survival package, but survdiff works > > > between curves; am not sure how I could use it (I have a predicted > > > curve for each curve, but no 'observed curve' - the only observation > > > is death or censoring at time x) > > > > > > Thank you all so much! > > > > > > Min-Han Tan > > > Van Andel Institute > > > > > > ______________________________________________ > > > [EMAIL PROTECTED] mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > > > -- > > Frank E Harrell Jr Professor and Chair School of Medicine > > Department of Biostatistics > > Vanderbilt University > > > > ______________________________________________ > > > > [EMAIL PROTECTED] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
