On Wed, 16 Jun 2004, Sean Davis wrote: >Is ?cut what you need?
This is giving the cleanest results yet. Cheers, Dan. > >Sean > > >On 6/16/04 6:52 AM, "Dan Bolser" <[EMAIL PROTECTED]> wrote: > >> >> First, thanks to everyone who helped me get to grips with R in (x)emacs >> (I get confused easily). Special thanks to Stephen Eglen for continued >> support. >> >> My question is about non-linear binning, or density functions over >> distributions governed by a power law ... >> >> y ~ mu*x**lambda # In one of its forms >> # (can't find Pareto in the online help) >> >> Looking at the following should show my problem.... >> >> x3 <- runif(10000)**3 # Probably a better (correct) way to do this >> >> plot( density(x3,cut=0,bw=0.1)) >> plot( density(x3,cut=0,bw=0.01)) >> plot( density(x3,cut=0,bw=0.001)) >> >> plot(density(x3,cut=0,bw=0.1), log='xy') >> plot(density(x3,cut=0,bw=0.01), log='xy') >> plot(density(x3,cut=0,bw=0.001),log='xy') >> >> The upper three plots show that the bw has a big effect on the appearance >> of the graph by rescaling based on the initial density at low values of x, >> which is very high. >> >> The lower plots show (I think) an error in the use of linear bins to view >> a non linear trend. I would expect this curve to be linear on log-log >> scales (from experience), and you can see the expected behavior in the >> tails of these plots. >> >> If you play with drawing these curves on top of each other they look OK >> apart from at the beginning. However, changing the band width to 0.0001 has >> a radical effect on these plots, and they begin to show a different trend >> (look like they are being governed by a different power). >> >> Hmmm.... >> >> x3log <- -log(x3) >> >> plot( density(x3log,cut=0,bw=0.5), log='y',col=1) >> >> lines(density(x3log,cut=0,bw=0.2), log='y',col=2) >> lines(density(x3log,cut=0,bw=0.1), log='y',col=3) >> lines(density(x3log,cut=0,bw=0.01), log='y',col=4) >> >> Sorry... >> >> >> 'Real' data of this form is usually discrete, with the value of 1 being >> the most frequent (minimum) event, and higher values occurring less >> frequently according to a power (power-law). This data can be easily >> grouped into discrete bins, and frequency plotted on log scales. The >> continuous data generated above requires some form of density estimation >> or rescaling into discreet values (make the smallest value equal to 1 and >> round everything else into an integer). >> >> I see the aggregate function, but which function lets me simply count the >> number of values in a class (integer bin)? >> >> The analysis of even the discretized data is made more accurate by the use >> of exponentially growing bins. This way you don't need to plot the data on >> log scales, and the increasing variance associated with lower probability >> events is handled by the increasing bin size (giving good accuracy of >> power fitting). How can I easily (ignorantly) implement exponentially >> increasing bin sizes? >> >> Thanks for any feedback, >> >> Dan. >> >> ______________________________________________ >> [EMAIL PROTECTED] mailing list >> https://www.stat.math.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >> > ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
