cigs figs

2001-06-17 Thread EugeneGall

On Slate, there is quite a good discussion of the meaning and probabilistic
basis of the statement that 1 in 3 teen smokers will die of cancer.  It is
written by a math prof and it is one of the most effective lay discussions I've
seen of the use of probabilities in describing health risks.

http://slate.msn.com/math/01-06-14/math.asp


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Normality in Factor Analysis

2001-06-17 Thread Herman Rubin

In article 9gg7ht$qa3$[EMAIL PROTECTED],
haytham siala [EMAIL PROTECTED] wrote:
Hi,

I have a question regarding factor analysis: Is normality an important
precondition for using factor analysis?

If no, are there any books that justify this.

Factor analysis is quite robust against non-normality.
The essential factor structure is little affected by it
at all, although the representation may get somewhat
sensitive if data-dependent normalizations are used, such
as using correlations rather than covariances, or forcing
normalization on the covariance matrix of the factors.

Some of this is in my paper with Anderson in the
Proceedings of the Third Berkeley Symposium.  The result
on the asymptotic distribution, not at all difficult to
derive, is in one of my abstracts in _Annals of
Mathematical Statistics_, 1955.  It is basically this:

Suppose the factor model is 

x = \Lambda f + s,

f the common factors and s the specific factors.  Further
suppose that f and s, and also the elements of s, are
uncorrelated, and there is adequate normalization and
smooth identification of the model by the elements of
\Lambda alone.  Now estimate \Lambda, M, the covariance
matrix of f, and S, the diagonal covariance matrix of s.
Assuming the usual assumptions for asymptotic normality of
the sample covariances of the elements of f with s, and of
the pairs of different elements of s, the asymptotic
distribution of the estimates of \Lambda and the SAMPLE
values of M and S from their actual values will have the
expected asymptotic joint normal distribution.  This makes
no assumption about the distribution of M and S about 
their expected values, which is the main place were there
is an effect of normality. 



-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Rich Ulrich

On 17 Jun 2001 04:34:26 -0700, [EMAIL PROTECTED] (Marc)
wrote:

 I have to summarize the results of some clinical trials.
 Unfortunately the reported information is not complete.
 The information given in the trials contain:
 
 (1) Mean effect in the treatment group (days of hospitalization)
 
 (2) Mean effect in the control group (days of hospitalization)
 
 (3) Numbers of patients in the control and treatment group
 
 (4) p-values of a t-test (between the differences of treatment
 and control)
 My question:
 How can I calculate the variance of treatment difference which I need
 to perform meta-analysis? Note that the numbers of patients in the

Aren't you going too far?  You said you have to summarize.
Well, summarize.  The difference is in terms of days.  
Or it is in terms of percentage of increase.

And you have the t-test and p-values.  

You might be right in what you propose, but I think
you are much more likely to produce a useful report 
if you keep it simple.

You are right; meta-analyses are complex.  And a 
majority of the published ones are (in my opinion) awful.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 02:04:36 -0700, [EMAIL PROTECTED] (Eamon) wrote:

[ snip, Paul Jones.  About marijuana statistics.]

 
 Surely this whole research is based upon a false premise. Isn't it
 like saying that 90%, say, of heroin users previously used soft drugs.
 Therefore, soft-drug use usually leads to hard-drug use - which does
 not logically follow. (A = B =/= B = A)
 
 Conclusions drawn from the set of people who have had heart attacks
 cannot be validly applied to the set of people who smoke dope.
 Rather than collect data from a large number of people who had heart
 attacks and look for a backward link, they should monitor a large
 number of people who smoke dope. But, of course this is much more
 expensive.

It is much more expensive, but it is also totally stupid to carry out
the expensive research if the *cheap* and lousy research didn't
give you a hint that there might be something going on.

The numbers that he was asking about do pass the simple
test.  I mean, there were not 1 million people contributing one
hour each, but we should still ask, *Would*  this say something?
If it would not, then the whole question is *totally*  arid.  The 2x2
table is approximately
(dividing the first column by 100; and subtracting from a total):
10687   and  124
   175   and  9

That gives a contingency test of 21.2 or 18.2, with p-values 
under .001.  The Odds Ratio on that is 4.4.
That is pretty convincing that there is SOMETHING
going on, POSSIBLY something that merits an explanation.  
The expectation for the cell with 9  is just 2.2 -- the tiny cell is
the cell that matters for contributions to the test -- which is why it
is okay to lop the hundreds  off the first column (to make it
readable).

Now, you may return to your discussion of why the table is
not any good, and what is needed for a proper test.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Donald Burrill

On 17 Jun 2001, Marc wrote (edited):

 I have to summarize the results of some clinical trials.
 The information given in the trials contain:
 
 Mean effects (days of hospitalization) in treatment  control groups; 
 numbers of patients in the groups;  p-values of a t-test (of the 
 difference between treatment and control) .
 My question:  How can I calculate the variance of the treatment 
 difference, which I need to perform meta-analysis?  Note that the 
 numbers of patients in the groups are not equal.  
 Is it possible to do it like this:
 
 s^2 = (difference between contr and treatm)^2/ ((1/n1+1/n2)*t^2)

Yes, if you know t.  If all you know is that p  alpha for some alpha, 
you then know only that t  the t corresponding to alpha (AND you need to 
know whether the test had been one-sided or two-sided -- of course, you 
need to know that in any case), you can substitute that corresponding t 
to obtain an upper bound on s^2 -- ASSUMING that the t was calculated 
using a pooled variance (your s^2), not using the expression for separate 
variances in the denominator:  (s1^2/n1 + s2^2/n2).

Note that this s^2 is NOT the variance of the treatment difference, 
which you said you wanted to know;  it is the pooled variance estimate 
of the variance within each group.  
 The variance of the difference in treatment means, which _may_ be what 
you are interested in, would be 

(difference)^2 / t^2 

with the same caveats concerning what you know about t.

 How exact would such an approximation be?

Depends on the precision with which  p  was reported.

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: individual item analysis

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 14:24:39 -0700, [EMAIL PROTECTED] (Doug
Sawyer) wrote:

 I am trying to locate a journal article or textbook that addresses
 whether or not exam quesitons can be normalized, when the questions are
 grouped differently.  For example, could a question bank be developed
 where any subset of questions could be selected, and the assembled exam
 is normalized?
 
 What is name of this area of statistics?  What authors or keywords would
 I use for such a search?  Do you know whether or not this can be done?


I believe that they do this sort of thing in scholastic achievement
tests, as a matter of course.  Isn't that how they make the transition
from year to year?  I guess this would be norming.

A few weeks ago, I discovered that there is a whole series of
tech-reports put out by one of the big test companies.  I would 
look back to it, for this sort of question.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-17 Thread Ken Reed

It's not really possible to explain this in lay person's terms. The
difference between principal factor analysis and common factor analysis is
roughly that PCA uses raw scores, whereas factor analysis uses scores
predicted from the other variables and does not include the residuals.
That's as close to lay terms as I can get.

I have never heard a simple explanation of maximum likelihood estimation,
but --  MLE compares the observed covariance matrix with a  covariance
matrix predicted by probability theory and uses that information to estimate
factor loadings etc that would 'fit' a normal (multivariate) distribution.

MLE factor analysis is commonly used in structural equation modelling, hence
Tracey Continelli's conflation of it with SEM. This is not correct though.

I'd love to hear simple explanation of MLE!



 From: [EMAIL PROTECTED] (Tracey Continelli)
 Organization: http://groups.google.com/
 Newsgroups: sci.stat.consult,sci.stat.edu,sci.stat.math
 Date: 15 Jun 2001 20:26:48 -0700
 Subject: Re: Factor Analysis
 
 Hi there,
 
 would someone please explain in lay person's terms the difference
 betwn.
 principal components, commom factors, and maximum likelihood
 estimation
 procedures for factor analyses?
 
 Should I expect my factors obtained through maximum likelihood
 estimation
 tobe highly correlated?  Why?  When should I use a Maximum likelihood
 estimation procedure, and when should I not use it?
 
 Thanks.
 
 Rita
 
 [EMAIL PROTECTED]
 
 
 Unlike the other methods, maximum likelihood allows you to estimate
 the entire structural model *simultaneously* [i.e., the effects of
 every independent variable upon every dependent variable in your
 model].  Most other methods only permit you to estimate the model in
 pieces, i.e., as a series of regressions whereby you regress every
 dependent variable upon every independent variable that has an arrow
 directly pointing to it.  Moreover, maximum likelihood actually
 provides a statistical test of significance, unlike many other methods
 which only provide generally accepted cut-off points but not an actual
 test of statistical significance.  There are very few cases in which I
 would use anything except a maximum likelihood approach, which you can
 use in either LISREL or if you use SPSS you can add on the module AMOS
 which will do this as well.
 
 
 Tracey



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=