Hi
Warning ... simulation that may only be of interest to those "really"
interested in question of restriction of range and correction.
A. Effect of Selection on SAT on SD of GPA
I initially set out to demonstrate that SD for GPA would shrink with selection
on SAT, to verify one point of discussion in this thread. Following SPSS
program generates 100,000 SAT scores (Mu = 500, Sigma = 100) and GPAs (Mu =
2.5, Sigma = .5) from a population with Rho = .71 (i.e., about 50% of GPA
predicted by SAT).
I then selected successive samples on basis of SAT scores giving the following
results.
Criterion N SD Sat SD GPA r
No select 100K 100 .50 .71
SAT>500 ~50K 59 .41 .52
SAT>600 ~16K 44 .38 .40
SAT>700 ~ 2K 33 .37 .31
SAT>800 132 22 .33 .28
SD SAT and the correlation both shrink, as expected, given selection was on
SAT. But the SD GPA also is reduced, albeit not as markedly as SD for SAT.
B. Accuracy of Correction for Restriction of Range
Given I had the datasets, I then decided to test out something I had asked Ken
about, namely, how good was the correction for restriction of range. I read in
the values for SD SAT (i.e., SDx) and r given the different degrees of
selection and then computed the standard correction for what is called direct
restriction of range (which is what I did by selecting on SAT). The values are
shown below in the columns headed rho# (I made a mistake initially and was so
surprised at the weird results I kept looking for and implementing various
versions of the formula, as shown below in the SPSS commands, hence the
multiple identical values, which are no longer weird after I corrected my
mistake).
To illustrate, the third row selects cases above 700 (2 SDs above the mean),
which amounts to about 16% of the 100,000 scores. The sample r of .4 produces
a population rho of .70, quite close to the actual value. With extreme
selection (bottom row, SAT > 800, .13% of scores), the formula appears to
overcorrect.
sdx r sigx sigx2 rho1 rho2 rho3 rho4 rho5
100.00 .7100 100.00 10000 .7100 .7100 .7100 .7100 .7100
59.000 .5200 100.00 10000 .7181 .7181 .7181 .7181 .7181
44.000 .4000 100.00 10000 .7042 .7042 .7042 .7042 .7042
33.000 .3100 100.00 10000 .7029 .7029 .7029 .7029 .7029
22.000 .2800 100.00 10000 .7984 .7984 .7984 .7984 .7984
The articles I briefly looked at to obtain the various formula certainly made
clear that the entire issue is far more complicated than I had appreciated
before this discussion. Factors considered include such things as sampling
ratio, shape of distributions, and underlying basis for selection. A recent
article, for example, observed that the standard correction that I used
generally UNDERCORRECTS for what is called indirect restriction of range. This
involves selection on the basis of some third variable related to X and Y, and
is thought to characterize many selection situations. That is, actual
selection is seldom JUST on the predictor being adjusted. The authors
speculate that many current findings might need to be reconsidered and might
actually be stronger than previously thought. See Schmidt et al, 2006,
Personnel Psychology.
Take care
Jim
The SPSS programs appear below.
input program.
loop o = 1 to 100000.
comp sat = rv.norm(0,1).
comp gpa = rv.norm(0,1)*.7071 + sat*.7071.
end case.
end loop.
end file.
end input program.
comp sat = rnd(500 + sat*100).
comp gpa = 2.5 + gpa*.5.
corr gpa sat /stat.
Mean Std. Deviation N
gpa 2.498497 .4986521 100000
sat 499.736190 99.6350487 100000
gpa sat
gpa Pearson 1 .706
temp.
select if sat > 500.
corr gpa sat /stat.
Mean Std. Deviation N
gpa 2.779867 .4127305 49824
sat 579.504335 59.7406878 49824
gpa sat
gpa Pearson 1 .516
temp.
select if sat > 600.
corr gpa sat /stat.
Mean Std. Deviation N
gpa 3.040882 .3842720 15578
sat 652.300488 44.1317182 15578
gpa sat
gpa Pearson 1 .400
temp.
select if sat > 700.
corr gpa sat /stat.
Mean Std. Deviation N
gpa 3.334208 .3693546 2184
sat 737.050824 33.3068533 2184
gpa sat
gpa Pearson 1 .309
temp.
select if sat > 800.
corr gpa sat /stat.
Mean Std. Deviation N
gpa 3.632516 .3280184 132
sat 824.863636 21.9338954 132
gpa sat
gpa Pearson 1 .279
*based on n = 100k.
data list free / sdx r.
begin data
100 .71
59 .52
44 .40
33 .31
22 .28
end data.
comp sigx = 100.
comp sigx2 = sigx**2.
comp rho1 = (sigx*r)/sqrt(sdx**2*(1-r**2)+sigx2*r**2).
comp rho2 = (r*sigx/sdx)/sqrt(1 - r**2 + (r**2)*(sigx/sdx)**2).
comp rho3 = (sigx/sdx)*r/sqrt(((sigx/sdx)**2)*r**2 - r**2 + 1).
comp rho4 = r/sqrt(r**2 + (1-r**2)*sdx**2/sigx2).
comp rho5 = ((1/(sdx/sigx))*r)/sqrt((1/(sdx/sigx)**2 - 1)*r**2 + 1).
list.
sdx r sigx sigx2 rho1 rho2 rho3 rho4 rho5
100.00 .7100 100.00 10000 .7100 .7100 .7100 .7100 .7100
59.000 .5200 100.00 10000 .7181 .7181 .7181 .7181 .7181
44.000 .4000 100.00 10000 .7042 .7042 .7042 .7042 .7042
33.000 .3100 100.00 10000 .7029 .7029 .7029 .7029 .7029
22.000 .2800 100.00 10000 .7984 .7984 .7984 .7984 .7984
James M. Clark
Professor of Psychology
204-786-9757
204-774-4134 Fax
[EMAIL PROTECTED]
---
To make changes to your subscription contact:
Bill Southerly ([EMAIL PROTECTED])