Is ?cut what you need? Sean
On 6/16/04 6:52 AM, "Dan Bolser" <[EMAIL PROTECTED]> wrote: > > First, thanks to everyone who helped me get to grips with R in (x)emacs > (I get confused easily). Special thanks to Stephen Eglen for continued > support. > > My question is about non-linear binning, or density functions over > distributions governed by a power law ... > > y ~ mu*x**lambda # In one of its forms > # (can't find Pareto in the online help) > > Looking at the following should show my problem.... > > x3 <- runif(10000)**3 # Probably a better (correct) way to do this > > plot( density(x3,cut=0,bw=0.1)) > plot( density(x3,cut=0,bw=0.01)) > plot( density(x3,cut=0,bw=0.001)) > > plot(density(x3,cut=0,bw=0.1), log='xy') > plot(density(x3,cut=0,bw=0.01), log='xy') > plot(density(x3,cut=0,bw=0.001),log='xy') > > The upper three plots show that the bw has a big effect on the appearance > of the graph by rescaling based on the initial density at low values of x, > which is very high. > > The lower plots show (I think) an error in the use of linear bins to view > a non linear trend. I would expect this curve to be linear on log-log > scales (from experience), and you can see the expected behavior in the > tails of these plots. > > If you play with drawing these curves on top of each other they look OK > apart from at the beginning. However, changing the band width to 0.0001 has > a radical effect on these plots, and they begin to show a different trend > (look like they are being governed by a different power). > > Hmmm.... > > x3log <- -log(x3) > > plot( density(x3log,cut=0,bw=0.5), log='y',col=1) > > lines(density(x3log,cut=0,bw=0.2), log='y',col=2) > lines(density(x3log,cut=0,bw=0.1), log='y',col=3) > lines(density(x3log,cut=0,bw=0.01), log='y',col=4) > > Sorry... > > > 'Real' data of this form is usually discrete, with the value of 1 being > the most frequent (minimum) event, and higher values occurring less > frequently according to a power (power-law). This data can be easily > grouped into discrete bins, and frequency plotted on log scales. The > continuous data generated above requires some form of density estimation > or rescaling into discreet values (make the smallest value equal to 1 and > round everything else into an integer). > > I see the aggregate function, but which function lets me simply count the > number of values in a class (integer bin)? > > The analysis of even the discretized data is made more accurate by the use > of exponentially growing bins. This way you don't need to plot the data on > log scales, and the increasing variance associated with lower probability > events is handled by the increasing bin size (giving good accuracy of > power fitting). How can I easily (ignorantly) implement exponentially > increasing bin sizes? > > Thanks for any feedback, > > Dan. > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
