On Sat, 21 Apr 2012 21:18:11 -0700 Jim Clark wrote: >On Sat, 21 Apr 2012 08:01:57 -0700, Jim Clark wrote: >[snip] >>>What I do know is that if you select a sample of N observations with mean M >>>and >>>standard deviation S out of a population with mean MU and standard deviation >>>SIGMA, then: >> >>>1. M will fall within MU +/- z(alpha/2)*SIGMA/sqrt(N) with probability = 1 - >>>alpha (hypothesis testing), and > > Michael Palij <[email protected]> 21-Apr-12 4:36 PM >>> >I believe this may be Fisher's position with respect to a one sample test. > >>>2. Equivalently, MU will fall within M +/- z(alpha/2)*SIGMA/sqrt(N) with >>>probability = 1 - alpha (confidence intervals). >> >>If you have only one CI, your second point is wrong -- this is what Neyman >>was emphasizing when he said that for a given CI, it either contained the >>population parameter (Prob= 1.00) or it didn't (Prob= 0.00). > >JC: >Sorry to disagree, but if 1 is true, 2 is also true. That is, if M is expected >to fall within a certain distance of MU with a certain probability, then MU is >expected to fall within that same distance of M with the same probability. In >the latter case, it is the CI that varies from sample to sample with 1-alpha of >the CIs containing MU, which is staying in position.
If you are saying that if you draw an infinite number of Ms and calculate a CI for each, then the infinite series will have 1-alpha probability of containing MU, you are correct. Any single CI, however, either has the MU or it doesn't. A CI, as argued by Neyman, will vary from sample to sample which may or may not contain the constant MU. >Just as you only have one CI in 2, you only have one M in 1, so it is not >obvious to me why CIs are prevented from being conceptualized as selected from >a hypothetical sampling distribution (according to Neyman given Michael's >summary) while Ms are not so constrained. Short answer: M is a random variable even if you only have one. The sampling distribution of means is its probability distribution. Long answer: The following based is based on my reading of Erich Lehmann's (2011) presentation on Neyman's confidence intervals which includes quotes from Neyman (pp80-82 from Fisher, Neyman, and the Creation of Classical Statistics): (1) Assume that here is an unknown population parameter Theta (it could be MU or any other parameter/constant) which is estimated by a sample statistic T (a random variable). (2) To determine how well T estimates Theta, one may want to define an interval where the lower limit is defined as Theta-lower = T - k1 St where T is our estimate, St is the standard error of T, and K1 is a constant, and the upper limit is defined by Theta-upper = T + k2 St, where T is our estimate, St is the standard error of T, and k2 is a constant. The true value of Theta falls between these values with some probability, or Theta-lower< Theta < Theta-upper NOTE: Theta is a constant, and Theta-lower and Theta-upper are random variables. (3) A specific level of probability level is selected independent of the true value of Theta which is used to select the values for the constants K1 and K2, consistent with an appropriate probability model (i.e., normal, t-distribution, etc.) and other consideration (e.g., selecting values that produce that shortest CIs). We may want an interval that contains Theta 95% of the time (i.e., a frequentist definition of probability). (4) In the long run, decisions about Theta will be correct 1-alpha of the time (in our case alpha=.05). Quoting Neyman: |It will be noticed that in the above description the probability |statements refer to the problems of estimation with which the |statistician will be concerned in the future. (p81) Note: the probability statement is not about this particular interval. (5) What can be concluded once a sample has been drawn? Assume Theta-lower = 1 and Theta-upper= 2, what is the probability that Theta falls between these two values? Quoting Neyman: |The answer is obviously in the negative. The parameter Theta |is an unknown constant and no probability statement concerning |its value may be made, that is, except for the hypothetical |and trivial ones that the probability of 1 <= Theta <= 2 |equals 1 if Theta lies between these limits, and 0 (zero) if |it doesn't. It was Fisher and his Fiducial intervals that conceptualized the population mean as a random variable which has an associated probability distribution but many people had difficulty understanding this (i.e., how does a constant become a random variable; Fisher's position appears to be that since the population parameter is unknown, there is a range of values it can have while still producing a null result for a statistical test or be consistent with an null interval of values and it is this distribution of parameter values that are being used in his interval that distinguishes it from Neyman's intervals). >To turn the issue around a bit, >given the way I have phrased #1 (Hypothesis Testing), it is also true that for >a given sample M, it will either be within a certain distance of MU or not. >But so what? We still don't say that p of being in the rejection region is 0 >or 1 based on the outcome for this one trial out of a hypothetical sampling >distribution. We don't say "that p of being in the rejection region is 0 or 1" because the p is the probability of an obtained result. I don't see how this is connected to an estimate of a constant like MU. >Perhaps I'm too concrete a thinker, but if I randomly sample 100,000 times from >a population and calculate the ps as described above, then both 1 and 2 MUST >both hold true. It is simply impossible for them not to. Specifically, with >respect to 2 and using z = 1.96 and sigma, then 95% of the CIs will contain MU, >which means that MU falls within my CI 95% of the time. Yes, of the 100,000 samples you draw, the 95%CI will contain MU in about 95% of them (in my previous post where I cite the book by Shravan Vasishth and Michael Broe, they show how to do such a simulation through R -- with 100 samples, however, only 94% of intervals contain MU). But the issue is not what you can conclude with a series of CIs, Neyman has shown that the long run probability of the series will be 1 - alpha. The point you originally made is what does a single CI tell you. As Lehmann points out: |Thus, while Neyman's confidence intervals became the accepted solution |of the estimation problem, in practice they were often interpreted in a |way that was much closer to Fisher's view than Neyman's. (p89) -Mike Palij New York University [email protected] --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=17429 or send a blank email to leave-17429-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
