Damned Bayesians! ;-)
-Mike Palij
New York University
[email protected]
----------- Original Message --------------
On Sun, 22 Apr 2012 14:31:43 -0700, Karl L Wuensch wrote:
What means "confidence" in the phrase "confidence interval?" Mike has
constructed a 95% CI for theta. Jim thinks there is a 95% probability that the
interval contains the population value of theta. Mike says "no, the
probability is either zero or 1." Karl steps in and proposes a wager. Karl
also happens to know the population value of theta, so he can declare the
winner of the wager. If Jim bets that the CI does contain the population theta
and Mike bets that it does not, what odds would result in this being a fair bet
(a bet for which the long-term expected gain is zero)? To be fair, Jim should
throw $19 into the pot for every $1 that Mike throws in. Convert those odds
into a probability (.95) and you have the degree of "confidence."
Regarding the query about to which camp does one belong, count me in
the "mish-mash" camp. :-)
Cheers,
Karl L. Wuensch
-----Original Message-----
From: Michael Palij [mailto:[email protected]]
Sent: Sunday, April 22, 2012 1:33 PM
To: Teaching in the Psychological Sciences (TIPS)
Cc: Michael Palij
Subject: Re:[tips] confidence intervals
On Sat, 21 Apr 2012 21:18:11 -0700 Jim Clark wrote:
>On Sat, 21 Apr 2012 08:01:57 -0700, Jim Clark wrote:
>[snip]
>>>What I do know is that if you select a sample of N observations with
>>>mean M and standard deviation S out of a population with mean MU and
>>>standard deviation SIGMA, then:
>>
>>>1. M will fall within MU +/- z(alpha/2)*SIGMA/sqrt(N) with
>>>probability = 1 - alpha (hypothesis testing), and
>
> Michael Palij <[email protected]> 21-Apr-12 4:36 PM >>> I believe this may
>be Fisher's position with respect to a one sample test.
>
>>>2. Equivalently, MU will fall within M +/- z(alpha/2)*SIGMA/sqrt(N)
>>>with probability = 1 - alpha (confidence intervals).
>>
>>If you have only one CI, your second point is wrong -- this is what
>>Neyman was emphasizing when he said that for a given CI, it either
>>contained the population parameter (Prob= 1.00) or it didn't (Prob= 0.00).
>
>JC:
>Sorry to disagree, but if 1 is true, 2 is also true. That is, if M is
>expected to fall within a certain distance of MU with a certain
>probability, then MU is expected to fall within that same distance of M
>with the same probability. In the latter case, it is the CI that
>varies from sample to sample with 1-alpha of the CIs containing MU, which is
>staying in position.
If you are saying that if you draw an infinite number of Ms and calculate a CI
for each, then the infinite series will have 1-alpha probability of containing
MU, you are correct. Any single CI, however, either has the MU or it doesn't.
A CI, as argued by Neyman, will vary from sample to sample which may or may not
contain the constant MU.
>Just as you only have one CI in 2, you only have one M in 1, so it is
>not obvious to me why CIs are prevented from being conceptualized as
>selected from a hypothetical sampling distribution (according to Neyman
>given Michael's
>summary) while Ms are not so constrained.
Short answer: M is a random variable even if you only have one.
The sampling distribution of means is its probability distribution.
Long answer:
The following based is based on my reading of Erich Lehmann's (2011)
presentation on Neyman's confidence intervals which includes quotes from Neyman
(pp80-82 from Fisher, Neyman, and the Creation of Classical Statistics):
(1) Assume that here is an unknown population parameter Theta (it could be MU
or any other parameter/constant) which is estimated by a sample statistic T (a
random variable).
(2) To determine how well T estimates Theta, one may want to define an
interval where the lower limit is defined as
Theta-lower = T - k1 St where T is our estimate, St is the standard error of T,
and K1 is a constant,
and the upper limit is defined by
Theta-upper = T + k2 St, where T is our estimate, St is the standard error of
T, and k2 is a constant.
The true value of Theta falls between these values with some probability, or
Theta-lower< Theta < Theta-upper
NOTE: Theta is a constant, and Theta-lower and Theta-upper are random variables.
(3) A specific level of probability level is selected independent of the true
value of Theta which is used to select the values for the constants K1 and K2,
consistent with an appropriate probability model (i.e., normal, t-distribution,
etc.) and other consideration (e.g., selecting values that produce that
shortest CIs). We may want an interval that contains Theta 95% of the time
(i.e., a frequentist definition of probability).
(4) In the long run, decisions about Theta will be correct 1-alpha of the time
(in our case alpha=.05). Quoting Neyman:
|It will be noticed that in the above description the probability
|statements refer to the problems of estimation with which the
|statistician will be concerned in the future. (p81)
Note: the probability statement is not about this particular interval.
(5) What can be concluded once a sample has been drawn?
Assume Theta-lower = 1 and Theta-upper= 2, what is the probability that Theta
falls between these two values?
Quoting Neyman:
|The answer is obviously in the negative. The parameter Theta is an
|unknown constant and no probability statement concerning its value may
|be made, that is, except for the hypothetical and trivial ones that the
|probability of 1 <= Theta <= 2 equals 1 if Theta lies between these
|limits, and 0 (zero) if it doesn't.
It was Fisher and his Fiducial intervals that conceptualized the population
mean as a random variable which has an associated probability distribution but
many people had difficulty understanding this (i.e., how does a constant become
a random variable; Fisher's position appears to be that since the population
parameter is unknown, there is a range of values it can have while still
producing a null result for a statistical test or be consistent with an null
interval of values and it is this distribution of parameter values that are
being used in his interval that distinguishes it from Neyman's intervals).
>To turn the issue around a bit,
>given the way I have phrased #1 (Hypothesis Testing), it is also true
>that for a given sample M, it will either be within a certain distance of MU
>or not.
>But so what? We still don't say that p of being in the rejection
>region is 0 or 1 based on the outcome for this one trial out of a
>hypothetical sampling distribution.
We don't say "that p of being in the rejection region is 0 or 1" because the p
is the probability of an obtained result. I don't see how this is connected to
an estimate of a constant like MU.
>Perhaps I'm too concrete a thinker, but if I randomly sample 100,000
>times from a population and calculate the ps as described above, then
>both 1 and 2 MUST both hold true. It is simply impossible for them not
>to. Specifically, with respect to 2 and using z = 1.96 and sigma, then
>95% of the CIs will contain MU, which means that MU falls within my CI 95% of
>the time.
Yes, of the 100,000 samples you draw, the 95%CI will contain MU in about 95% of
them (in my previous post where I cite the book by Shravan Vasishth and Michael
Broe, they show how to do such a simulation through R -- with 100 samples,
however, only 94% of intervals contain MU).
But the issue is not what you can conclude with a series of CIs, Neyman has
shown that the long run probability of the series will be 1 - alpha. The point
you originally made is what does a single CI tell you. As Lehmann points out:
|Thus, while Neyman's confidence intervals became the accepted solution
|of the estimation problem, in practice they were often interpreted in a
|way that was much closer to Fisher's view than Neyman's. (p89)
-Mike Palij
New York University
[email protected]
---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here:
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=17433
or send a blank email to
leave-17433-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu