RE: [tips] Spurious Correlations

Jim Clark Fri, 10 Oct 2014 08:20:18 -0700

Hi

A lot of the discussion of how to interpret correlations involves the presence 
of a simple correlation, as in the spurious correlation examples. It is equally 
important to emphasize to students that the absence of correlation is subject 
to all the same concerns. That is, absence of correlation does not imply 
absence of relationship between X and Y because of all the same mechanisms. For 
example, Z might be positively related to X and negatively related to Y, 
masking a direct positive association between X and Y.

Take care
Jim

Jim Clark
Professor & Chair of Psychology
204-786-9757
4L41A

-----Original Message-----
From: Mike Palij [mailto:[email protected]] 
Sent: Friday, October 10, 2014 10:10 AM
To: Teaching in the Psychological Sciences (TIPS)
Cc: Michael Palij
Subject: RE: [tips] Spurious Correlations

On Fri, 10 Oct 2014 06:41:38 -0700, Rick Froman wrote:
>One thing I have found helpful in teaching the concept of spurious 
>correlations is to have students populate a number of columns in a 
>spreadsheet with random numbers and then calculate correlations between 
>all the columns of random numbers. Since they are random, the 
>correlation in the population from which all of these samples are drawn 
>is 0. For every
>100 correlations calculated in this circumstance, using a .05 alpha 
>level, students will find about five spurious correlations that are 
>statistically significant but are clearly spurious (mind blown) :)

I like this but it works primarily as a mathematical exercise.
The real issue is how to translate what one learns from such exercises to real 
life research situations where one is calculating correlations between 
variables.  Unless one knows the real-life situation/phenomenon really well one 
won't know when a statistically significant correlation is real or a Type I 
error.

A minor point:  technically the example you provide above is not an example of 
a spurious correlation, rather, it is an example of making Type I errors.  
Consider the following distinctions, partly based on Haig's writing (ref below)

(1)  Nonsense Correlations: we have two variables X and Y and they are 
correlated  X <--> Y but there is no reasonable or plausible explanation for 
why such a correlation exist.  Haig uses the example of the high positive 
correlation between human birth rate and the number of storks in Great Britain 
during period of time (see Haig p127).
Haig notes that Kendall & Buckland (1982) in their dictionary of statistical 
terms defines such a result an "illusory correlation".
The correlation appears to be real, possibly due to a "butterfly effect" (see 
below) but is not easily explainable.

(2)  Traditional "Spurious Correlations": we have three variables X, Y, and Z 
and X and Y are not correlated at all but both are dependent upon Z or X <-- Z 
--> Y.  One example I use is "If you look take all of the cities in the U.S. 
with population over 100,000 and make Y = number of crimes committed and X = 
number of churches in each city, you will probably find a positive correlation 
between number of crimes and number of churches.  The simple mined solution to 
eliminating this relationship would be get rid of the churches ("Just Say No!") 
and crime should disappear.
However, smaller cities should have both fewer crimes and churches and larger 
cities should have both more crimes and churches.
But this is probably due to population size:  control or partial out the 
relationship of population size to the number of crimes and churches and you'll 
probably find that the correlation disappears.
If it doesn't, then consider closing the churches. ;-)

(3)  Haig's "Spurious Correlations": We have three variables X, Y, and Z and X 
is related to Y but is mediated by Z, that is, X --> Z ---> Y.  This is an 
"indirect correlation" (in contrast to a direct correlation X <---> Y which is 
not dependent upon a third variable) and is of interest in its own right.  
Indeed, mediation and moderation analysis is a popular method analysis 
especially in for correlational and quasi-experimental designs.

So, spurious correlations can be tricky things especially when dealing with 
correlations from uncontrolled situations and/or one has limited knowledge of 
the phenomenon being studied.

-Mike Palij
New York University
[email protected]

-----Original Message-----
On Friday, October 10, 2014 8:17 AM, Mike Palij wrote:
>On Thu, 09 Oct 2014 18:23:19 -0700, Carol DeVolder wrote:
>>Perhaps others are familiar with this site, but I wasn't. It's a fun 
>>collection of spurious correlations. Good for examples in class.
>> http://tylervigen.com/

For people interested in such things, I suggest one take a look at some of 
Brian Haig's writing on spurious correlations which provides a more "nuanced"
perspective on them (one can classify spurious correlation between those that 
are truly spurious versus those that are not).  Here's the reference for one of 
his articles:

Haig, B. D. (2003). What is a spurious correlation?. Understanding
Statistics: Statistical Issues in Psychology, Education, and the Social 
Sciences, 2(2), 125-132.

http://www.tandfonline.com/doi/abs/10.1207/S15328031US0202_03#preview:
or
http://psycnet.apa.org/psycinfo/2004-12710-003

A key point is whether a correlation represents a direct "effect" or 
relationship (which is typically assumed in a correlational analysis) or an 
indirect "effect" or relationship exists between two or more variables.

If we have three variables X, Y, and Z, and

(1) there is no direct relationship between X and Z but
(2) there is an indirect relationship X -> Z -> Y

This raises thorny questions of mediation and moderation which I will leave to 
Karl Wuensch to elaborate (or to provide access to his notes on the these 
topics ;-).

Haig would probably call the correlations provided on the Tyler Vigen website 
"nonsense correlations" but, for fans of the belief of "everything is connected 
to everything else", one might refer to the "butterfly effect".

The butterfly effect refers to two conceptually unrelated events (apparently
nonsensical) but which are connected by a complex nonlinear relationship.

Simple correlational analysis that (a) do not have the necessary intermediate 
variables, and/or (b) do not have the necessary nonlinear terms, will not 
accurately represent the relationship or, more correctly, the process that 
connects two variables.

Just something to think about. ;-) 

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13251.645f86b5cec4da0a56ffea7a891720c9&n=T&l=tips&o=39064
or send a blank email to 
leave-39064-13251.645f86b5cec4da0a56ffea7a89172...@fsulist.frostburg.edu

---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=39065
or send a blank email to 
leave-39065-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

RE: [tips] Spurious Correlations

Reply via email to