If they want to generate directly from the empirical distribution, then 
sampling with replacement is the best choice (others had already suggested 
that).  But the reference in the original post to the normal and beta 
distributions suggested to me that the original poster may have wanted a smooth 
approximation to the empirical distribution rather than the step function (but 
not locked to a specific distribution).  The logspline package has functions 
for doing things like this.  It has the advantage that it can give a smooth 
(non-step) plot of the cdf (estimated) as well as generate points that are 
based on the observed data, but could generate values outside the original 
range of the data and have fewer ties.  

Whether these "advantages" make any difference depends on what they want to do 
with the observations (for many applications the difference is probably 
negligible and using sample is the simplest/best).  But there may be some uses 
for which these "advantages" are beneficial.  (using sample then adding a small 
random "error" to each value is another option, but I like the logspline option 
better).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -----Original Message-----
> From: Frank Harrell [mailto:f.harr...@vanderbilt.edu]
> Sent: Tuesday, July 27, 2010 4:54 PM
> To: Greg Snow
> Cc: xin wei; r-help@r-project.org
> Subject: Re: [R] how to generate a random data from a empirical
> distribition
> 
> Easiest thing is to sample with replacement from the original data.
> This is the idea behind the bootstrap, which is sampling from the
> empirical CDF.
> 
> Frank E Harrell Jr   Professor and Chairman        School of Medicine
>                       Department of Biostatistics   Vanderbilt
> University
> 
> On Tue, 27 Jul 2010, Greg Snow wrote:
> 
> > Another option for fitting a smooth distribution to data (and
> generating future observations from the smooth distribution) is to use
> the logspline package.
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> >
> >> -----Original Message-----
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> >> project.org] On Behalf Of xin wei
> >> Sent: Monday, July 26, 2010 12:36 PM
> >> To: r-help@r-project.org
> >> Subject: [R] how to generate a random data from a empirical
> >> distribition
> >>
> >>
> >> hi, this is more a statistical question than a R question. but I do
> >> want to
> >> know how to implement this in R.
> >> I have 10,000 data points. Is there any way to generate a empirical
> >> probablity distribution from it (the problem is that I do not know
> what
> >> exactly this distribution follows, normal, beta?). My ultimate goal
> is
> >> to
> >> generate addition 20,000 data point from this empirical distribution
> >> created
> >> from the existing 10,000 data points.
> >> thank you all in advance.
> >>
> >>
> >> --
> >> View this message in context: http://r.789695.n4.nabble.com/how-to-
> >> generate-a-random-data-from-a-empirical-distribition-
> >> tp2302716p2302716.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> ______________________________________________
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to