cigs figs
On Slate, there is quite a good discussion of the meaning and probabilistic basis of the statement that 1 in 3 teen smokers will die of cancer. It is written by a math prof and it is one of the most effective lay discussions I've seen of the use of probabilities in describing health risks. http://slate.msn.com/math/01-06-14/math.asp = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Normality in Factor Analysis
In article 9gg7ht$qa3$[EMAIL PROTECTED], haytham siala [EMAIL PROTECTED] wrote: Hi, I have a question regarding factor analysis: Is normality an important precondition for using factor analysis? If no, are there any books that justify this. Factor analysis is quite robust against non-normality. The essential factor structure is little affected by it at all, although the representation may get somewhat sensitive if data-dependent normalizations are used, such as using correlations rather than covariances, or forcing normalization on the covariance matrix of the factors. Some of this is in my paper with Anderson in the Proceedings of the Third Berkeley Symposium. The result on the asymptotic distribution, not at all difficult to derive, is in one of my abstracts in _Annals of Mathematical Statistics_, 1955. It is basically this: Suppose the factor model is x = \Lambda f + s, f the common factors and s the specific factors. Further suppose that f and s, and also the elements of s, are uncorrelated, and there is adequate normalization and smooth identification of the model by the elements of \Lambda alone. Now estimate \Lambda, M, the covariance matrix of f, and S, the diagonal covariance matrix of s. Assuming the usual assumptions for asymptotic normality of the sample covariances of the elements of f with s, and of the pairs of different elements of s, the asymptotic distribution of the estimates of \Lambda and the SAMPLE values of M and S from their actual values will have the expected asymptotic joint normal distribution. This makes no assumption about the distribution of M and S about their expected values, which is the main place were there is an effect of normality. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: meta-analysis
On 17 Jun 2001 04:34:26 -0700, [EMAIL PROTECTED] (Marc) wrote: I have to summarize the results of some clinical trials. Unfortunately the reported information is not complete. The information given in the trials contain: (1) Mean effect in the treatment group (days of hospitalization) (2) Mean effect in the control group (days of hospitalization) (3) Numbers of patients in the control and treatment group (4) p-values of a t-test (between the differences of treatment and control) My question: How can I calculate the variance of treatment difference which I need to perform meta-analysis? Note that the numbers of patients in the Aren't you going too far? You said you have to summarize. Well, summarize. The difference is in terms of days. Or it is in terms of percentage of increase. And you have the t-test and p-values. You might be right in what you propose, but I think you are much more likely to produce a useful report if you keep it simple. You are right; meta-analyses are complex. And a majority of the published ones are (in my opinion) awful. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Marijuana
On 15 Jun 2001 02:04:36 -0700, [EMAIL PROTECTED] (Eamon) wrote: [ snip, Paul Jones. About marijuana statistics.] Surely this whole research is based upon a false premise. Isn't it like saying that 90%, say, of heroin users previously used soft drugs. Therefore, soft-drug use usually leads to hard-drug use - which does not logically follow. (A = B =/= B = A) Conclusions drawn from the set of people who have had heart attacks cannot be validly applied to the set of people who smoke dope. Rather than collect data from a large number of people who had heart attacks and look for a backward link, they should monitor a large number of people who smoke dope. But, of course this is much more expensive. It is much more expensive, but it is also totally stupid to carry out the expensive research if the *cheap* and lousy research didn't give you a hint that there might be something going on. The numbers that he was asking about do pass the simple test. I mean, there were not 1 million people contributing one hour each, but we should still ask, *Would* this say something? If it would not, then the whole question is *totally* arid. The 2x2 table is approximately (dividing the first column by 100; and subtracting from a total): 10687 and 124 175 and 9 That gives a contingency test of 21.2 or 18.2, with p-values under .001. The Odds Ratio on that is 4.4. That is pretty convincing that there is SOMETHING going on, POSSIBLY something that merits an explanation. The expectation for the cell with 9 is just 2.2 -- the tiny cell is the cell that matters for contributions to the test -- which is why it is okay to lop the hundreds off the first column (to make it readable). Now, you may return to your discussion of why the table is not any good, and what is needed for a proper test. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: meta-analysis
On 17 Jun 2001, Marc wrote (edited): I have to summarize the results of some clinical trials. The information given in the trials contain: Mean effects (days of hospitalization) in treatment control groups; numbers of patients in the groups; p-values of a t-test (of the difference between treatment and control) . My question: How can I calculate the variance of the treatment difference, which I need to perform meta-analysis? Note that the numbers of patients in the groups are not equal. Is it possible to do it like this: s^2 = (difference between contr and treatm)^2/ ((1/n1+1/n2)*t^2) Yes, if you know t. If all you know is that p alpha for some alpha, you then know only that t the t corresponding to alpha (AND you need to know whether the test had been one-sided or two-sided -- of course, you need to know that in any case), you can substitute that corresponding t to obtain an upper bound on s^2 -- ASSUMING that the t was calculated using a pooled variance (your s^2), not using the expression for separate variances in the denominator: (s1^2/n1 + s2^2/n2). Note that this s^2 is NOT the variance of the treatment difference, which you said you wanted to know; it is the pooled variance estimate of the variance within each group. The variance of the difference in treatment means, which _may_ be what you are interested in, would be (difference)^2 / t^2 with the same caveats concerning what you know about t. How exact would such an approximation be? Depends on the precision with which p was reported. Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: individual item analysis
On 15 Jun 2001 14:24:39 -0700, [EMAIL PROTECTED] (Doug Sawyer) wrote: I am trying to locate a journal article or textbook that addresses whether or not exam quesitons can be normalized, when the questions are grouped differently. For example, could a question bank be developed where any subset of questions could be selected, and the assembled exam is normalized? What is name of this area of statistics? What authors or keywords would I use for such a search? Do you know whether or not this can be done? I believe that they do this sort of thing in scholastic achievement tests, as a matter of course. Isn't that how they make the transition from year to year? I guess this would be norming. A few weeks ago, I discovered that there is a whole series of tech-reports put out by one of the big test companies. I would look back to it, for this sort of question. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Factor Analysis
It's not really possible to explain this in lay person's terms. The difference between principal factor analysis and common factor analysis is roughly that PCA uses raw scores, whereas factor analysis uses scores predicted from the other variables and does not include the residuals. That's as close to lay terms as I can get. I have never heard a simple explanation of maximum likelihood estimation, but -- MLE compares the observed covariance matrix with a covariance matrix predicted by probability theory and uses that information to estimate factor loadings etc that would 'fit' a normal (multivariate) distribution. MLE factor analysis is commonly used in structural equation modelling, hence Tracey Continelli's conflation of it with SEM. This is not correct though. I'd love to hear simple explanation of MLE! From: [EMAIL PROTECTED] (Tracey Continelli) Organization: http://groups.google.com/ Newsgroups: sci.stat.consult,sci.stat.edu,sci.stat.math Date: 15 Jun 2001 20:26:48 -0700 Subject: Re: Factor Analysis Hi there, would someone please explain in lay person's terms the difference betwn. principal components, commom factors, and maximum likelihood estimation procedures for factor analyses? Should I expect my factors obtained through maximum likelihood estimation tobe highly correlated? Why? When should I use a Maximum likelihood estimation procedure, and when should I not use it? Thanks. Rita [EMAIL PROTECTED] Unlike the other methods, maximum likelihood allows you to estimate the entire structural model *simultaneously* [i.e., the effects of every independent variable upon every dependent variable in your model]. Most other methods only permit you to estimate the model in pieces, i.e., as a series of regressions whereby you regress every dependent variable upon every independent variable that has an arrow directly pointing to it. Moreover, maximum likelihood actually provides a statistical test of significance, unlike many other methods which only provide generally accepted cut-off points but not an actual test of statistical significance. There are very few cases in which I would use anything except a maximum likelihood approach, which you can use in either LISREL or if you use SPSS you can add on the module AMOS which will do this as well. Tracey = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =