Re: [R-sig-teaching] Creating data

Christophe Genolini Tue, 24 Mar 2009 13:16:57 -0700

Hi Scott

When I create artificial data, I try to "copy" some real. So I mesurethe real data with as much parameters than I can (mean, var, cov, butalso percent of NA, outlier), then I generate the artificial one. It isalso possible to generate several sets that I finaly mixe (lika one setfor men, one set for women. Then I remove the variable "gender", I mergethe two set and I shuffle the resulting set.



Christophe

Hi everyone-

I'm currently teaching a graduate course in statistics for linguistics
using R. I have used up most of the 'authentic' data I have been able
to collect for homework and demonstrations. I can think of plenty more
possible data sets, but I am finding the creation of them challenging,
and my creations are often somewhat unlealistic (generally, too

'neat' and obvious).

So, I was wondering if anyone had any tips on creating 'realistic'
data sets, or links/books that describe it.

For a simple example, let's say I want to create a dataset with
students from different countries and academic departments who took an
English test. I want to make some differences (significant and not)
and possibly even interactions among the scores by country and
department. I have been doing this through various iterations of
sample() and rnorm(), and jitter() to get some randomness, but things
are still coming out pretty neatly.  Is this the right (or a good)
method? Advice?

Thanks in advance-

SFK


_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching

Re: [R-sig-teaching] Creating data

Reply via email to