I agree with Jim that such results are not "ludicrous" (my poor choice of words) and that they should be treated as estimates that went outside of the range of what is possible (for example, a z-score associated with a raw score so high or low that such a raw test score wouldn't be possible because you can't get less than 0 items correct or you would have to get more items correct than were on the test) . If such results show, for example, a p greater than 1 or an F less than 0, I think those results should not just be reproduced without comment. I was mainly concerned with cases where a person would think that the negative sign was actually providing useful information and was not commented on. Also, the use of the term r-squared to refer to a negative number just seems entirely confusing and reporting such a thing uncommented or unrevised would indicate to me that the person didn't realize Jim's point, that the number is an estimate that is not demonstrating a real effect. I wasn't actually advocating ignoring the misleading numbers but just not reporting them without realizing they indicate something. Also, I think you would have to be concerned with rounding or calculation error if you ever got a negative F value. It was my point that such a result should be pointed out, examined and adjusted instead of just being reported unaltered.
Rick Dr. Rick Froman, Chair Division of Humanities and Social Sciences Professor of Psychology Box 3055 John Brown University 2000 W. University Siloam Springs, AR 72761 [email protected] (479)524-7295 http://tinyurl.com/DrFroman "The LORD detests both Type I and Type II errors." Proverbs 17:15 Jim Clark Wed, 21 Apr 2010 18:02:14 -0700 Hi In most of the examples given, I think people are being too harsh to say that the results are ridiculous or meaningless. When one "estimates" some hypothetical value there may be conditions where the underlying value is so close to a maximum or a minimum that your estimate falls above the max or below the min. For example, estimating r**2 in the population (i.e., rho**2) from the sample r might produce a negative r**2. Or calculating a biserial r to estimate what the correlation would be if a dichotomized x was continuous, your estimate might end up being greater than +1 or less than -1. Rather than being meaningless, the former indicates to me that the best estimate of the population r**2 is 0 and the latter that r for a continuous x is estimated to be at or close to 1. The critical thing in these examples is that you are NOT describing some property of the data, but rather estimating some hypothetical property on the basis of the data. And estimates can always fall on either side of the actual value, even if the region on one side (or parts of it) is impossible. If there were some adjustment that produced a negative F, as in Rick's example below, it is important to note that the relevant p is NOT the probability of that F but rather the probability of an F that size OR GREATER given whatever the null hypothesis is; that is, p values are areas under distributions across some range of values for your statistic. Under those circumstances, I would say that the appropriate p to report would be 1.0, just as for F = 0, which is possible. And in the same manner that Bonferroni ps greater than 1 are reported as 1, as mentioned in one of my examples of "misbehaving" stats. I'm sure we would never make the following mistake, but I think that one danger of saying statistic X is meaningless given its calculated value would be that naive users might use that as an excuse to ignore X and report some acceptable, but perhaps less correct statistic. For example, my R**2 adjusted was negative, so I will ignore it and report R**2 (which ironically is more likely to be "meaningless" than R**2 adjusted given a sufficient number of predictors and small sample size). People here were clearly recommending a more thoughtful approach. Or in the SPSS simulation of rb that I distributed, it would be incorrect, I think, to ignore the rbs>1 or <-1 in determining the expected value of rb. I noted briefly that the mean rb of the 1,000 samples (10,000 in some other simulations I ran) were very close to rho between continuous X and Y. But there would be circumstances where the fit would appear to be biased if "deviant" samples were ignored. None of this is meant to undermine the many valid reservations about, for example, the biserial r or other statistics. I'm just less certain than others that an "impossible" value for some statistic itself allows a ready judgment about its appropriateness. Take care Jim James M. Clark Professor of Psychology 204-786-9757 204-774-4134 Fax [email protected] >>> Rick Froman <[email protected]> 21-Apr-10 2:34:40 PM >>> Or in this case, using a terminology that is clearly a nonsensical violation of the obvious. Saying that you have computed a negative coefficient of determination, for example, would not be so obviously ludicrous as saying you have computed a negative r-squared (if you know that squaring any value, positive or negative, can't produce a negative number). By the way, I hope a negative F is more than rare. Given that it is a ratio of between group and within group variances (and variances like area cannot be negative), if there is some "correction" or "adjustment" that produces a negative F, I can't see how it would be anything but meaningless (given that the probability of a negative F as taken from the F-distribution is zero -- none of the area under the F-curve exists to the left of 0). Are there also procedures that produce negative p-values? Rick Dr. Rick Froman, Chair Division of Humanities and Social Sciences Box 3055 x7295 [email protected] http://tinyurl.com/DrFroman --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=2164 or send a blank email to leave-2164-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
