Hi In most of the examples given, I think people are being too harsh to say that the results are ridiculous or meaningless. When one "estimates" some hypothetical value there may be conditions where the underlying value is so close to a maximum or a minimum that your estimate falls above the max or below the min. For example, estimating r**2 in the population (i.e., rho**2) from the sample r might produce a negative r**2. Or calculating a biserial r to estimate what the correlation would be if a dichotomized x was continuous, your estimate might end up being greater than +1 or less than -1. Rather than being meaningless, the former indicates to me that the best estimate of the population r**2 is 0 and the latter that r for a continuous x is estimated to be at or close to 1.
The critical thing in these examples is that you are NOT describing some property of the data, but rather estimating some hypothetical property on the basis of the data. And estimates can always fall on either side of the actual value, even if the region on one side (or parts of it) is impossible. If there were some adjustment that produced a negative F, as in Rick's example below, it is important to note that the relevant p is NOT the probability of that F but rather the probability of an F that size OR GREATER given whatever the null hypothesis is; that is, p values are areas under distributions across some range of values for your statistic. Under those circumstances, I would say that the appropriate p to report would be 1.0, just as for F = 0, which is possible. And in the same manner that Bonferroni ps greater than 1 are reported as 1, as mentioned in one of my examples of "misbehaving" stats. I'm sure we would never make the following mistake, but I think that one danger of saying statistic X is meaningless given its calculated value would be that naive users might use that as an excuse to ignore X and report some acceptable, but perhaps less correct statistic. For example, my R**2 adjusted was negative, so I will ignore it and report R**2 (which ironically is more likely to be "meaningless" than R**2 adjusted given a sufficient number of predictors and small sample size). People here were clearly recommending a more thoughtful approach. Or in the SPSS simulation of rb that I distributed, it would be incorrect, I think, to ignore the rbs>1 or <-1 in determining the expected value of rb. I noted briefly that the mean rb of the 1,000 samples (10,000 in some other simulations I ran) were very close to rho between continuous X and Y. But there would be circumstances where the fit would appear to be biased if "deviant" samples were ignored. None of this is meant to undermine the many valid reservations about, for example, the biserial r or other statistics. I'm just less certain than others that an "impossible" value for some statistic itself allows a ready judgment about its appropriateness. Take care Jim James M. Clark Professor of Psychology 204-786-9757 204-774-4134 Fax [email protected] >>> Rick Froman <[email protected]> 21-Apr-10 2:34:40 PM >>> Or in this case, using a terminology that is clearly a nonsensical violation of the obvious. Saying that you have computed a negative coefficient of determination, for example, would not be so obviously ludicrous as saying you have computed a negative r-squared (if you know that squaring any value, positive or negative, can't produce a negative number). By the way, I hope a negative F is more than rare. Given that it is a ratio of between group and within group variances (and variances like area cannot be negative), if there is some "correction" or "adjustment" that produces a negative F, I can't see how it would be anything but meaningless (given that the probability of a negative F as taken from the F-distribution is zero -- none of the area under the F-curve exists to the left of 0). Are there also procedures that produce negative p-values? Rick Dr. Rick Froman, Chair Division of Humanities and Social Sciences Box 3055 x7295 [email protected] http://tinyurl.com/DrFroman Proverbs 14:15 "A simple man believes anything, but a prudent man gives thought to his steps." -----Original Message----- From: Mike Palij [mailto:[email protected]] Sent: Wednesday, April 21, 2010 12:20 PM To: Teaching in the Psychological Sciences (TIPS) Cc: Mike Palij Subject: RE: Re:[tips] Biserial r. On Wed, 21 Apr 2010 05:48:50 -0700, Rick Froman wrote: >OK, I know that some correlational techniques occasionally produce r greater >than 1 or less than -1 but I think I am on firm footing when I say that I am >not going to see a negative r-squared in the set of real numbers used in >statistical calculations (although it may occur with complex numbers > http://mathforum.org/library/drmath/view/52613.html ). Unfortunately, this is not true. A simple Google search for either "negative R squared" or "negative R square" will provide a variety of hits. A number of conditions can give rise to a negative R square but they all tend to be pathological. If you used the regression tool in pre-2003 Excel and you forced the regression through the origin (i.e., the intercept is zero), you could get a negative R-square as well as negative sum of squares, etc., (it is somewhat unusual to see a negative F-value in the output). Microsoft fixed the code that created these results in Excel 2003 and later versions; see the following website (scroll down to "Regression" or search for the word "negative" on the page): http://support.microsoft.com/default.aspx?scid=kb;en-us;829208 Negative R square can also be obtained in multilevel or HLM analyses. Consider the following that attempts to identify the variance accounted for in a heirarchial model: |Socioeconomic status explains 45% of the explainable between-unit |variance in this model using the first formula and 59% using the second |formula. Thus, it appears that socioeconomic status contributes greatly |to explaining variation between schools, but does not explain much |variance in math achievement scores. | |It should be noted that there are some potential problems with the method |described above. One possible problem is the possibility that the level-1 |variance is larger in the restricted model than the unrestricted model, which |would produce negative R-squared values. Kreft and De Leeuw (1998) point |out that the formula may not apply to situations where there are random |intercepts. This is especially true for computing the between-unit variance |explained, as there is not a single level-2 error term in models containing |random slopes. from: http://ssc.utexas.edu/software/faqs/hlm The Question is "R-squared in a Hierarchical Model" which is lower on the page. The cited reference for Kreft & De Leeuw is: Kreft, I., De Leeuw, J. (1998). Introducing Multilevel Modeling. London: Sage Publications. So, it is possible to get oddball values for statistics and for a variety of reasons, ranging from improperly programmed procedures to situations where key assumptions are violated. In either case, one has to think through what is going on. -Mike Palij New York University [email protected] --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13039.37a56d458b5e856d05bcfb3322db5f8a&n=T&l=tips&o=2142 or send a blank email to leave-2142-13039.37a56d458b5e856d05bcfb3322db5...@fsulist.frostburg.edu --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13251.645f86b5cec4da0a56ffea7a891720c9&n=T&l=tips&o=2145 or send a blank email to leave-2145-13251.645f86b5cec4da0a56ffea7a89172...@fsulist.frostburg.edu --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=2151 or send a blank email to leave-2151-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
