Just to add what Karl has written below: (1)There is a Wikipedia entry for what they call the "Anscombe Quartet" which refers to the 4 pairs of x-y values: see: http://en.wikipedia.org/wiki/Anscombe%27s_quartet
(2)For some people, having the Anscombe data in an Excel worksheet is more convenient because one can get the appropriate descriptive statistics, correlations, and scatterplots. It should be easy enough to create such an Excel file (I have one that I use in class). (3) The Anscombe data was first published in the journal The American Statistician which is available through www.jstor.org. One copy floating around on the internets is available at: http://sciencepolicy.colorado.edu/about_us/meet_us/roger_pielke/envs_5120/week_16/Anscombe.pdf but I don't recommend it as background reading to students because it is written for statisticians and assumes knowledge that students may have only by the end of a course. (4) In the second URL that Karl provides below, it goes to a webpage that has links to other aspects about correlation and regression. One site provides scatterplots and the person has to guess what the value of the Pearson r. This might be a useful adjunct to the Anscombe dataset: see: http://istics.net/stat/Correlations/ -Mike Palij New York University [email protected] On Thu, 18 Feb 2010 19:43:10 -0800, Karl L Wuensch wrote: >The Anscombe data (strongly recommended): >SAS >data PW; input x1 y1 x2 y2 x3 y3 x4 y4; cards; >10 8.04 10 9.14 10 7.46 8 6.58 > 8 6.95 8 8.14 8 6.77 8 5.76 >13 7.58 13 8.74 13 12.74 8 7.7 > 9 8.81 9 8.77 9 7.11 8 8.84 >11 8.33 11 9.26 11 7.81 8 8.47 >14 9.96 14 8.10 14 8.84 8 7.04 >6 7.24 6 6.13 6 6.08 8 5.25 >4 4.26 4 3.10 4 5.39 19 12.50 >12 10.84 12 9.13 12 8.15 8 5.56 >7 4.82 7 7.26 7 6.42 8 7.91 >5 5.68 5 4.74 5 5.73 8 6.89 >; >proc reg simple; A: model y1 = x1; plot y1 * x1; > B: model y2 = x2; plot y2 * x2; > C: model y3 = x3; plot y3 * x3; > D: model y4 = x4; plot y4 * x4; run; > >SPSS: Bring CORR_REGR.SAV (available at >http://core.ecu.edu/psyc/wuenschk/SPSS/SPSS-Data.htm ) >into SPSS. From the Data Editor, click Data, Split File, >Organize Output by Groups, and scoot Set into the >"Organize output by groups" box. Click Analyze, Regression, >Linear. >Scoot Y into the Dependent box and X into the Independent(s) >box. Click Stat and ask for Descriptives (Estimates and Model >Fit should already be selected). >Click Continue, OK. Click Graphs, Scatter, Simple. >Identify Y as the Y variable and X as the X variable. >Click OK. > > Look at the output. For each of the data sets, the >mean on X is 9, the mean on Y is 7.5, the standard deviation >for X is 3.32, the standard deviation for Y is 2.03, the >r is .816, and the regression equation is Y = 3 + .5X - >but now look at the plots. In Set A, we have a plot that >looks about like what we would expect for a moderate to >large positive correlation. In set B we see that the >relationship is really curvilinear, and that the data >could be fit much better with a curved line (a polynomial >function, quadratic, would fit them well). In Set C we >see that, with the exception of one outlier, the relationship >is nearly perfect linear. In set D we see that the >relationship would be zero if we eliminated the one extreme >outlier -- with no variance in X, there can be no covariance >with Y. > >Also of possible interest: >http://core.ecu.edu/psyc/wuenschk/StatHelp/Linear-Games.htm --- You are currently subscribed to tips as: [email protected]. To unsubscribe click here: http://fsulist.frostburg.edu/u?id=13090.68da6e6e5325aa33287ff385b70df5d5&n=T&l=tips&o=716 or send a blank email to leave-716-13090.68da6e6e5325aa33287ff385b70df...@fsulist.frostburg.edu
