Re:[tips] confidence intervals

Michael Palij Sun, 22 Apr 2012 10:33:05 -0700

On Sat, 21 Apr 2012 21:18:11 -0700 Jim Clark wrote:
>On Sat, 21 Apr 2012 08:01:57 -0700, Jim Clark wrote:
>[snip]
>>>What I do know is that if you select a sample of N observations with mean M 
>>>and
>>>standard deviation S out of a population with mean MU and standard deviation
>>>SIGMA, then:
>>
>>>1.  M will fall within MU +/- z(alpha/2)*SIGMA/sqrt(N) with probability = 1 -
>>>alpha (hypothesis testing), and
>
> Michael Palij <[email protected]> 21-Apr-12 4:36 PM >>>
>I believe this may be Fisher's position with respect to a one sample test.
>
>>>2.  Equivalently, MU will fall within M +/- z(alpha/2)*SIGMA/sqrt(N) with
>>>probability = 1 - alpha (confidence intervals).
>>
>>If you have only one CI, your second point is wrong -- this is what Neyman
>>was emphasizing when he said that for a given CI, it either contained the
>>population parameter (Prob= 1.00) or it didn't (Prob= 0.00).
>
>JC:
>Sorry to disagree, but if 1 is true, 2 is also true.  That is, if M is expected
>to fall within a certain distance of MU with a certain probability, then MU is
>expected to fall within that same distance of M with the same probability.  In
>the latter case, it is the CI that varies from sample to sample with 1-alpha of
>the CIs containing MU, which is staying in position.


If you are saying that if you draw an infinite number of Ms and
calculate a CI for each, then the infinite series will have 1-alpha
probability of containing MU, you are correct.  Any single CI, however,
either has the MU or it doesn't. A CI, as argued by Neyman, will vary
from sample to sample which may or may not contain the constant MU.

>Just as you only have one CI in 2, you only have one M in 1, so it is not
>obvious to me why CIs are prevented from being conceptualized as selected from
>a hypothetical sampling distribution (according to Neyman given Michael's
>summary) while Ms are not so constrained.

Short answer: M is a random variable even if you only have one.
The sampling distribution of means is its probability distribution.

Long answer:
The following based is based on my reading of Erich Lehmann's (2011)
presentation on Neyman's confidence intervals which includes quotes
from Neyman (pp80-82 from Fisher, Neyman, and the Creation of
Classical Statistics):

(1)  Assume that here is an unknown population parameter Theta (it could
be MU or any other parameter/constant) which is estimated by a sample
statistic T (a random variable).

(2)  To determine how well T estimates Theta, one may want to define
an interval where the lower limit is defined as

Theta-lower = T - k1 St where T is our estimate, St is the standard error
of T, and K1 is a constant,

and the upper limit is defined by

Theta-upper = T + k2 St, where T is our estimate, St is the standard
error of T, and k2 is a constant.

The true value of Theta falls between these values with some probability,
or

Theta-lower< Theta < Theta-upper

NOTE: Theta is a constant, and Theta-lower and Theta-upper are
random variables.

(3) A specific level of probability level is selected independent of the true
value of Theta which is used to select the values for the constants K1
and K2, consistent with an appropriate probability model (i.e., normal,
t-distribution, etc.) and other consideration (e.g., selecting values that
produce that shortest CIs). We may want an interval that contains Theta 95%
of the time (i.e., a frequentist definition of probability).

(4)  In the long run, decisions about Theta will be correct 1-alpha
of the time (in our case alpha=.05).  Quoting Neyman:

|It will be noticed that in the above description the probability
|statements refer to the problems of estimation with which the
|statistician will be concerned in the future. (p81)

Note:  the probability statement is not about this particular
interval.

(5) What can be concluded once a sample has been drawn?
Assume Theta-lower = 1 and Theta-upper= 2, what is the probability
that Theta falls between these two values?
Quoting Neyman:

|The answer is obviously in the negative.  The parameter Theta
|is an unknown constant and no probability statement concerning
|its value may be made, that is, except for the hypothetical
|and trivial ones that the probability of 1 <= Theta <= 2
|equals 1 if Theta lies between these limits, and 0 (zero) if
|it doesn't.

It was Fisher and his Fiducial intervals that conceptualized the
population mean as a random variable which has an associated
probability distribution but many people had difficulty understanding
this (i.e., how does a constant become a random variable;  Fisher's
position appears to be that since the population parameter is unknown,
there is a range of values it can have while still producing a
null result for a statistical test or be consistent with an null interval of
values and it is this distribution of parameter values that are
being used in his interval that distinguishes it from Neyman's
intervals).

>To turn the issue around a bit,
>given the way I have phrased #1 (Hypothesis Testing), it is also true that for
>a given sample M, it will either be within a certain distance of MU or not.
>But so what?  We still don't say that p of being in the rejection region is 0
>or 1 based on the outcome for this one trial out of a hypothetical sampling
>distribution.

We don't say "that p of being in the rejection region is 0 or 1" because
the p is the probability of an obtained result.  I don't see how this is
connected to an estimate of a constant like MU.

>Perhaps I'm too concrete a thinker, but if I randomly sample 100,000 times from
>a population and calculate the ps as described above, then both 1 and 2 MUST
>both hold true.  It is simply impossible for them not to.  Specifically, with
>respect to 2 and using z = 1.96 and sigma, then 95% of the CIs will contain MU,
>which means that MU falls within my CI 95% of the time.

Yes, of the 100,000 samples you draw, the 95%CI will contain MU in
about 95% of them (in my previous post where I cite the book by Shravan
Vasishth and Michael Broe, they show how to do such a simulation through
R -- with 100 samples, however, only 94% of intervals contain MU).

But the issue is not what you can conclude with a series of CIs,
Neyman has shown that the long run probability of the series will
be 1 - alpha.  The point you originally made is what does a single CI
tell you.  As Lehmann points out:

|Thus, while Neyman's confidence intervals became the accepted solution
|of the estimation problem, in practice they were often interpreted in a
|way that was much closer to Fisher's view than Neyman's. (p89)

-Mike Palij
New York University
[email protected]

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=17429
or send a blank email to 
leave-17429-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Re:[tips] confidence intervals

Reply via email to