Hi:

On Mon, Jul 26, 2010 at 11:36 AM, xin wei <xin...@stat.psu.edu> wrote:

>
> hi, this is more a statistical question than a R question. but I do want to
> know how to implement this in R.
> I have 10,000 data points. Is there any way to generate a empirical
> probablity distribution from it (the problem is that I do not know what
> exactly this distribution follows, normal, beta?). My ultimate goal is to
> generate addition 20,000 data point from this empirical distribution
> created
> from the existing 10,000 data points.
> thank you all in advance.
>

The problem, it seems to me, is the leap of faith you're taking that the
empirical distribution of your manifest sample will serve as a useful
data-generating mechanism for the 20,000 future observations you want to
take. I would think that, if you intend to take a sample of 20,000 from ANY
distribution, you would want some confidence in the specification of said
distribution.

Even if you don't know exactly what type of population distribution you're
dealing with, there are ways to narrow down the set of possibilities. What
is the domain/support of the distribution? For example, the Normal is
defined on all of R (as in the real numbers, not our favorite statistical
programming language), whereas the lognormal, Gamma and Weibull
distributions, among others, are defined on the nonnegative reals. The beta
distribution is defined on [0, 1]. Therefore, knowledge of the domain is
useful in and of itself. Is it plausible that the distribution is symmetric,
or should it have a distinct left or right skew? (Similar comments apply to
discrete distributions.) Is censoring or truncation a relevant concern? If
there is a random process that well describes how the data you observe are
generated, that will certainly narrow down the class of potential
data-generating mechanisms/distributions.

Once you've narrowed down the class of possible distributions as much as
possible, you could look into the fitdistr() function in MASS or the
fitdistrplus package on CRAN to test out which candidates seem plausible wrt
your existing sample and which are not. You are not likely to be able to
narrow it down to one family of distributions, but you should have a much
better idea about the characteristics of the distribution that gave rise to
your sample of 10,000 (assuming, of course, that it is a *random* sample)
after going through this exercise, which you can apply to the generation of
the next 20,000 observations.

OTOH, if your existing 10,000 observations were not produced by some random
process, all bets are off.

HTH,
Dennis

>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/how-to-generate-a-random-data-from-a-empirical-distribition-tp2302716p2302716.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to