Hi A lot of the discussion of how to interpret correlations involves the presence of a simple correlation, as in the spurious correlation examples. It is equally important to emphasize to students that the absence of correlation is subject to all the same concerns. That is, absence of correlation does not imply absence of relationship between X and Y because of all the same mechanisms. For example, Z might be positively related to X and negatively related to Y, masking a direct positive association between X and Y.
Take care Jim Jim Clark Professor & Chair of Psychology 204-786-9757 4L41A -----Original Message----- From: Mike Palij [mailto:[email protected]] Sent: Friday, October 10, 2014 10:10 AM To: Teaching in the Psychological Sciences (TIPS) Cc: Michael Palij Subject: RE: [tips] Spurious Correlations On Fri, 10 Oct 2014 06:41:38 -0700, Rick Froman wrote: >One thing I have found helpful in teaching the concept of spurious >correlations is to have students populate a number of columns in a >spreadsheet with random numbers and then calculate correlations between >all the columns of random numbers. Since they are random, the >correlation in the population from which all of these samples are drawn >is 0. For every >100 correlations calculated in this circumstance, using a .05 alpha >level, students will find about five spurious correlations that are >statistically significant but are clearly spurious (mind blown) :) I like this but it works primarily as a mathematical exercise. The real issue is how to translate what one learns from such exercises to real life research situations where one is calculating correlations between variables. Unless one knows the real-life situation/phenomenon really well one won't know when a statistically significant correlation is real or a Type I error. A minor point: technically the example you provide above is not an example of a spurious correlation, rather, it is an example of making Type I errors. Consider the following distinctions, partly based on Haig's writing (ref below) (1) Nonsense Correlations: we have two variables X and Y and they are correlated X <--> Y but there is no reasonable or plausible explanation for why such a correlation exist. Haig uses the example of the high positive correlation between human birth rate and the number of storks in Great Britain during period of time (see Haig p127). Haig notes that Kendall & Buckland (1982) in their dictionary of statistical terms defines such a result an "illusory correlation". The correlation appears to be real, possibly due to a "butterfly effect" (see below) but is not easily explainable. (2) Traditional "Spurious Correlations": we have three variables X, Y, and Z and X and Y are not correlated at all but both are dependent upon Z or X <-- Z --> Y. One example I use is "If you look take all of the cities in the U.S. with population over 100,000 and make Y = number of crimes committed and X = number of churches in each city, you will probably find a positive correlation between number of crimes and number of churches. The simple mined solution to eliminating this relationship would be get rid of the churches ("Just Say No!") and crime should disappear. However, smaller cities should have both fewer crimes and churches and larger cities should have both more crimes and churches. But this is probably due to population size: control or partial out the relationship of population size to the number of crimes and churches and you'll probably find that the correlation disappears. If it doesn't, then consider closing the churches. ;-) (3) Haig's "Spurious Correlations": We have three variables X, Y, and Z and X is related to Y but is mediated by Z, that is, X --> Z ---> Y. This is an "indirect correlation" (in contrast to a direct correlation X <---> Y which is not dependent upon a third variable) and is of interest in its own right. Indeed, mediation and moderation analysis is a popular method analysis especially in for correlational and quasi-experimental designs. So, spurious correlations can be tricky things especially when dealing with correlations from uncontrolled situations and/or one has limited knowledge of the phenomenon being studied. -Mike Palij New York University [email protected] -----Original Message----- On Friday, October 10, 2014 8:17 AM, Mike Palij wrote: >On Thu, 09 Oct 2014 18:23:19 -0700, Carol DeVolder wrote: >>Perhaps others are familiar with this site, but I wasn't. It's a fun >>collection of spurious correlations. Good for examples in class. >> http://tylervigen.com/ For people interested in such things, I suggest one take a look at some of Brian Haig's writing on spurious correlations which provides a more "nuanced" perspective on them (one can classify spurious correlation between those that are truly spurious versus those that are not). Here's the reference for one of his articles: Haig, B. D. (2003). What is a spurious correlation?. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 2(2), 125-132. http://www.tandfonline.com/doi/abs/10.1207/S15328031US0202_03#preview: or http://psycnet.apa.org/psycinfo/2004-12710-003 A key point is whether a correlation represents a direct "effect" or relationship (which is typically assumed in a correlational analysis) or an indirect "effect" or relationship exists between two or more variables. If we have three variables X, Y, and Z, and (1) there is no direct relationship between X and Z but (2) there is an indirect relationship X -> Z -> Y This raises thorny questions of mediation and moderation which I will leave to Karl Wuensch to elaborate (or to provide access to his notes on the these topics ;-). Haig would probably call the correlations provided on the Tyler Vigen website "nonsense correlations" but, for fans of the belief of "everything is connected to everything else", one might refer to the "butterfly effect". The butterfly effect refers to two conceptually unrelated events (apparently nonsensical) but which are connected by a complex nonlinear relationship. Simple correlational analysis that (a) do not have the necessary intermediate variables, and/or (b) do not have the necessary nonlinear terms, will not accurately represent the relationship or, more correctly, the process that connects two variables. Just something to think about. ;-) --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13251.645f86b5cec4da0a56ffea7a891720c9&n=T&l=tips&o=39064 or send a blank email to leave-39064-13251.645f86b5cec4da0a56ffea7a89172...@fsulist.frostburg.edu --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=39065 or send a blank email to leave-39065-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
