Just to add what Karl has written below:

(1)There is a Wikipedia entry for what they call the
"Anscombe Quartet" which refers to the 4 pairs of x-y 
values: see:
http://en.wikipedia.org/wiki/Anscombe%27s_quartet

(2)For some people, having the Anscombe data in an Excel
worksheet is more convenient because one can get the
appropriate descriptive statistics, correlations, and
scatterplots.  It should be easy enough to create such
an Excel file (I have one that I use in class).

(3) The Anscombe data was first published in the journal
The American Statistician which is available through
www.jstor.org.  One copy floating around on the internets
is available at:
http://sciencepolicy.colorado.edu/about_us/meet_us/roger_pielke/envs_5120/week_16/Anscombe.pdf
but I don't recommend it as background reading to students
because it is written for statisticians and assumes
knowledge that students may have only by the end of a
course.

(4) In the second URL that Karl provides below, it goes
to a webpage that has links to other aspects about correlation
and regression.  One site provides scatterplots and the
person has to guess what the value of the Pearson r.
This might be a useful adjunct to the Anscombe dataset:
see:
http://istics.net/stat/Correlations/

-Mike Palij
New York University
[email protected]



On Thu, 18 Feb 2010 19:43:10 -0800, Karl L Wuensch wrote:
>The Anscombe data (strongly recommended):
>SAS
>data PW; input x1 y1 x2 y2 x3 y3 x4 y4; cards;
>10 8.04       10 9.14       10 7.46         8 6.58
> 8 6.95        8 8.14        8 6.77         8 5.76
>13 7.58       13 8.74       13 12.74        8 7.7
> 9 8.81        9 8.77        9 7.11         8 8.84
>11 8.33       11 9.26       11 7.81         8 8.47
>14 9.96       14 8.10       14 8.84         8 7.04
>6 7.24         6 6.13        6 6.08         8 5.25
>4 4.26         4 3.10        4 5.39        19 12.50
>12 10.84      12 9.13       12 8.15         8 5.56
>7 4.82         7 7.26        7 6.42         8 7.91
>5 5.68         5 4.74        5 5.73         8 6.89
>;
>proc reg simple; A: model y1 = x1; plot y1 * x1;
>  B: model y2 = x2; plot y2 * x2;
>  C: model y3 = x3; plot y3 * x3;
>  D: model y4 = x4; plot y4 * x4;  run;
>
>SPSS:  Bring CORR_REGR.SAV (available at 
>http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm  ) 
>into SPSS.  From the Data Editor, click Data, Split File, 
>Organize Output by Groups, and scoot Set into the 
>"Organize output by groups" box.  Click Analyze, Regression, 
>Linear.  
>Scoot Y into the Dependent box and X into the Independent(s) 
>box.  Click Stat and ask for Descriptives (Estimates and Model 
>Fit should already be selected).  
>Click Continue, OK.  Click Graphs, Scatter, Simple.  
>Identify Y as the Y variable and X as the X variable.  
>Click OK.
>
>       Look at the output.  For each of the data sets, the 
>mean on X is 9, the mean on Y is 7.5, the standard deviation 
>for X is 3.32, the standard deviation for Y is 2.03, the 
>r is .816, and the regression equation is Y = 3 + .5X - 
>but now look at the plots.  In Set A, we have a plot that 
>looks about like what we would expect for a moderate to 
>large positive correlation.  In set B we see that the 
>relationship is really curvilinear, and that the data 
>could be fit much better with a curved line (a polynomial 
>function, quadratic, would fit them well).  In Set C we 
>see that, with the exception of one outlier, the relationship 
>is nearly perfect linear.  In set D we see that the 
>relationship would be zero if we eliminated the one extreme 
>outlier -- with no variance in X, there can be no covariance 
>with Y.
>
>Also of possible interest:  
>http://core.ecu.edu/psyc/wuenschk/StatHelp/Linear-Games.htm 
---
You are currently subscribed to tips as: [email protected].
To unsubscribe click here: 
http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=716
or send a blank email to 
leave-716-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu

Reply via email to