RE: Re:[tips] Biserial r.

Jim Clark Wed, 21 Apr 2010 18:02:14 -0700

Hi

In most of the examples given, I think people are being too harsh to say that 
the results are ridiculous or meaningless.  When one "estimates" some 
hypothetical value there may be conditions where the underlying value is so 
close to a maximum or a minimum that your estimate falls above the max or below 
the min.  For example, estimating r**2 in the population (i.e., rho**2) from 
the sample r might produce a negative r**2.  Or calculating a biserial r to 
estimate what the correlation would be if a dichotomized x was continuous, your 
estimate might end up being greater than +1 or less than -1.  Rather than being 
meaningless, the former indicates to me that the best estimate of the 
population r**2 is 0 and the latter that r for a continuous x is estimated to 
be at or close to 1.

The critical thing in these examples is that you are NOT describing some 
property of the data, but rather estimating some hypothetical property on the 
basis of the data.  And estimates can always fall on either side of the actual 
value, even if the region on one side (or parts of it) is impossible.

If there were some adjustment that produced a negative F, as in Rick's example 
below, it is important to note that the relevant p is NOT the probability of 
that F but rather the probability of an F that size OR GREATER given whatever 
the null hypothesis is; that is, p values are areas under distributions across 
some range of values for your statistic.  Under those circumstances, I would 
say that the appropriate p to report would be 1.0, just as for F = 0, which is 
possible.  And in the same manner that Bonferroni ps greater than 1 are 
reported as 1, as mentioned in one of my examples of "misbehaving" stats.

I'm sure we would never make the following mistake, but I think that one danger 
of saying statistic X is meaningless given its calculated value would be that 
naive users might use that as an excuse to ignore X and report some acceptable, 
but perhaps less correct statistic.  For example, my R**2 adjusted was 
negative, so I will ignore it and report R**2 (which ironically is more likely 
to be "meaningless" than R**2 adjusted given a sufficient number of predictors 
and small sample size).  People here were clearly recommending a more 
thoughtful approach.

Or in the SPSS simulation of rb that I distributed, it would be incorrect, I 
think, to ignore the rbs>1 or <-1 in determining the expected value of rb.  I 
noted briefly that the mean rb of the 1,000 samples (10,000 in some other 
simulations I ran) were very close to rho between continuous X and Y.  But 
there would be circumstances where the fit would appear to be biased if 
"deviant" samples were ignored.

None of this is meant to undermine the many valid reservations about, for 
example, the biserial r or other statistics.  I'm just less certain than others 
that an "impossible" value for some statistic itself allows a ready judgment 
about its appropriateness.

Take care
Jim

James M. Clark
Professor of Psychology
204-786-9757
204-774-4134 Fax
[email protected]

>>> Rick Froman <[email protected]> 21-Apr-10 2:34:40 PM >>>
Or in this case, using a terminology that is clearly a nonsensical violation of 
the obvious. Saying that you have computed a negative coefficient of 
determination, for example, would not be so obviously ludicrous as saying you 
have computed a negative r-squared (if you know that squaring any value, 
positive or negative, can't produce a negative number). By the way, I hope a 
negative F is more than rare. Given that it is a ratio of between group and 
within group variances (and variances like area cannot be negative), if there 
is some "correction" or "adjustment" that produces a negative F, I can't see 
how it would be anything but meaningless (given that the probability of a 
negative F as taken from the F-distribution is zero -- none of the area under 
the F-curve exists to the left of 0). Are there also procedures that produce 
negative p-values?

Rick

Dr. Rick Froman, Chair
Division of Humanities and Social Sciences Box 3055
x7295
[email protected] 
http://tinyurl.com/DrFroman 

Proverbs 14:15 "A simple man believes anything, but a prudent man gives thought 
to his steps." 

-----Original Message-----
From: Mike Palij [mailto:[email protected]] 
Sent: Wednesday, April 21, 2010 12:20 PM
To: Teaching in the Psychological Sciences (TIPS)
Cc: Mike Palij
Subject: RE: Re:[tips] Biserial r.

On Wed, 21 Apr 2010 05:48:50 -0700, Rick Froman wrote:
>OK, I know that some correlational techniques occasionally produce r greater 
>than 1 or less than -1 but I think I am on firm footing when I say that I am 
>not going to see a negative r-squared in the set of real numbers used in 
>statistical calculations (although it may occur with complex numbers 
> http://mathforum.org/library/drmath/view/52613.html  ). 

Unfortunately, this is not true.  A simple Google search for either 
"negative R squared" or "negative R square" will provide a variety of
hits.  A number of conditions can give rise to a negative R square
but they all tend to be pathological.  If you used the regression tool 
in pre-2003 Excel and you forced the regression through the origin 
(i.e., the intercept is zero), you could get a negative R-square 
as well as negative sum of squares, etc., (it is somewhat unusual to see
a negative F-value in the output).  Microsoft fixed the code that created
these results in Excel 2003 and later versions; see the following website 
(scroll down to "Regression" or search for the  word "negative" on the page):
http://support.microsoft.com/default.aspx?scid=kb;en-us;829208  

Negative R square can also be obtained in multilevel or HLM analyses.
Consider the following that attempts to identify the variance accounted
for in a heirarchial model:

|Socioeconomic status explains 45% of the explainable between-unit 
|variance in this model using the first formula and 59% using the second 
|formula. Thus, it appears that socioeconomic status contributes greatly 
|to explaining variation between schools, but does not explain much 
|variance in math achievement scores.
|
|It should be noted that there are some potential problems with the method 
|described above. One possible problem is the possibility that the level-1 
|variance is larger in the restricted model than the unrestricted model, which 
|would produce negative R-squared values. Kreft and De Leeuw (1998) point 
|out that the formula may not apply to situations where there are random 
|intercepts. This is especially true for computing the between-unit variance 
|explained, as there is not a single level-2 error term in models containing 
|random slopes.

from:
http://ssc.utexas.edu/software/faqs/hlm 
The Question is "R-squared in a Hierarchical Model" which is lower on the
page. The cited reference for Kreft & De Leeuw is:

Kreft, I., De Leeuw, J. (1998). Introducing Multilevel Modeling. 
London: Sage Publications.

So, it is possible to get oddball values for statistics and for a variety of
reasons, ranging from improperly programmed procedures to situations
where key assumptions are violated.  In either case, one has to think through
what is going on.

-Mike Palij
New York University
[email protected] 

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13039.37a56d458b5e856d05bcfb3322db5f8a&n=T&l=tips&o=2142

or send a blank email to 
leave-2142-13039.37a56d458b5e856d05bcfb3322db5...@fsulist.frostburg.edu 

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13251.645f86b5cec4da0a56ffea7a891720c9&n=T&l=tips&o=2145

or send a blank email to 
leave-2145-13251.645f86b5cec4da0a56ffea7a89172...@fsulist.frostburg.edu

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=2151
or send a blank email to 
leave-2151-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

RE: Re:[tips] Biserial r.

Reply via email to