RE: [tips] Spurious Correlations

Mike Palij Fri, 10 Oct 2014 08:10:42 -0700

On Fri, 10 Oct 2014 06:41:38 -0700, Rick Froman wrote:

One thing I have found helpful in teaching the concept of
spurious correlations is to have students populate a number
of columns in a spreadsheet with random numbers and then
calculate correlations between all the columns of random
numbers. Since they are random, the correlation in the population
from which all of these samples are drawn is 0. For every
100 correlations calculated in this circumstance, using a .05
alpha level, students will find about five spurious correlations
that are statistically significant but are clearly spurious
(mind blown) :)


I like this but it works primarily as a mathematical exercise.
The real issue is how to translate what one learns from such
exercises to real life research situations where one is calculating
correlations between variables.  Unless one knows the real-life
situation/phenomenon really well one won't know when a
statistically significant correlation is real or a Type I error.

A minor point:  technically the example you provide above is
not an example of a spurious correlation, rather, it is an
example of making Type I errors.  Consider the following
distinctions, partly based on Haig's writing (ref below)

(1)  Nonsense Correlations: we have two variables X and Y and
they are correlated  X <--> Y but there is no reasonable or plausible
explanation for why such a correlation exist.  Haig uses the example
of the high positive correlation between human birth rate and the
number of storks in Great Britain during period of time (see Haig p127).
Haig notes that Kendall & Buckland (1982) in their dictionary of
statistical terms defines such a result an "illusory correlation".
The correlation appears to be real, possibly due to a "butterfly
effect" (see below) but is not easily explainable.

(2)  Traditional "Spurious Correlations": we have three variables
X, Y, and Z and X and Y are not correlated at all but both are
dependent upon Z or X <-- Z --> Y.  One example I use is "If
you look take all of the cities in the U.S. with population over 100,000
and make Y = number of crimes committed and X = number of
churches in each city, you will probably find a positive correlation
between number of crimes and number of churches.  The simple
mined solution to eliminating this relationship would be get rid
of the churches ("Just Say No!") and crime should disappear.
However, smaller cities should have both fewer crimes and churches
and larger cities should have both more crimes and churches.
But this is probably due to population size:  control or partial out
the relationship of population size to the number of crimes and
churches and you'll probably find that the correlation disappears.
If it doesn't, then consider closing the churches. ;-)

(3)  Haig's "Spurious Correlations": We have three variables
X, Y, and Z and X is related to Y but is mediated by Z, that is,
X --> Z ---> Y.  This is an "indirect correlation" (in contrast to
a direct correlation X <---> Y which is not dependent upon a
third variable) and is of interest in its own right.  Indeed, mediation
and moderation analysis is a popular method analysis especially
in for correlational and quasi-experimental designs.

So, spurious correlations can be tricky things especially when
dealing with correlations from uncontrolled situations and/or
one has limited knowledge of the phenomenon being studied.

-Mike Palij
New York University
[email protected]


-----Original Message-----
On Friday, October 10, 2014 8:17 AM, Mike Palij wrote:

On Thu, 09 Oct 2014 18:23:19 -0700, Carol DeVolder wrote:

Perhaps others are familiar with this site, but I wasn't. It's a fun
collection of spurious correlations. Good for examples in class.
http://tylervigen.com/

For people interested in such things, I suggest one take a look at someofBrian Haig's writing on spurious correlations which provides a more"nuanced"perspective on them (one can classify spurious correlation between thosethatare truly spurious versus those that are not). Here's the reference forone of

his articles:

Haig, B. D. (2003). What is a spurious correlation?. Understanding
Statistics: Statistical Issues in Psychology, Education, and the Social
Sciences, 2(2), 125-132.

http://www.tandfonline.com/doi/abs/10.1207/S15328031US0202_03#preview:
or
http://psycnet.apa.org/psycinfo/2004-12710-003

A key point is whether a correlation represents a direct "effect" or

relationship (which is typically assumed in a correlational analysis) oran

indirect "effect" or relationship exists between two or more variables.

If we have three variables X, Y, and Z, and

(1) there is no direct relationship between X and Z
but
(2) there is an indirect relationship X -> Z -> Y

This raises thorny questions of mediation and moderation which I willleave toKarl Wuensch to elaborate (or to provide access to his notes on thethese

topics ;-).

Haig would probably call the correlations provided on the Tyler Vigenwebsite"nonsense correlations" but, for fans of the belief of "everything isconnected

to everything else", one might refer to the "butterfly effect".

The butterfly effect refers to two conceptually unrelated events(apparentlynonsensical) but which are connected by a complex nonlinearrelationship.

Simple correlational analysis that (a) do not have the necessaryintermediatevariables, and/or (b) do not have the necessary nonlinear terms, willnotaccurately represent the relationship or, more correctly, the processthat

connects two variables.

Just something to think about. ;-)


---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=39064
or send a blank email to 
leave-39064-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

RE: [tips] Spurious Correlations

Reply via email to