-roger
Dan Bolser wrote:
On Wed, 16 Jun 2004, Roger D. Peng wrote:
You can use hist(x, br, plot = FALSE)$counts.
-roger
OK, I made this...
y4 <- runif(100000)**4 y4log <- -log(y4) y4bin20 <- hist(y4log,20,plot=FALSE)$counts y4bin20log <- log(y4bin20)
plot(y4bin20log)
for(i in 2:20){ y4 <- runif(100000)**4 y4log <- -log(y4) y4bin20 <- hist(y4log,20,plot=FALSE)$counts y4bin20log <- log(y4bin20) points(y4bin20log,col=i) }
I am seeing something very strange about 1 loop in every five or ten of the above...
Can someone try this and let me know if every fifth run or so they see a totally different trend? Basically looks like only 10 bins are being used at random during the run!?
Hopefully you will also see what I am trying to do, which is to highlight the increased variance in the tail...
How would I estimate the slope of y4bin20log? (sorry for the names).
Thanks for the help, Dan.
Dan Bolser wrote:
First, thanks to everyone who helped me get to grips with R in (x)emacs (I get confused easily). Special thanks to Stephen Eglen for continued support.
My question is about non-linear binning, or density functions over distributions governed by a power law ...
y ~ mu*x**lambda # In one of its forms # (can't find Pareto in the online help)
Looking at the following should show my problem....
x3 <- runif(10000)**3 # Probably a better (correct) way to do this
plot( density(x3,cut=0,bw=0.1)) plot( density(x3,cut=0,bw=0.01)) plot( density(x3,cut=0,bw=0.001))
plot(density(x3,cut=0,bw=0.1), log='xy') plot(density(x3,cut=0,bw=0.01), log='xy') plot(density(x3,cut=0,bw=0.001),log='xy')
The upper three plots show that the bw has a big effect on the appearance of the graph by rescaling based on the initial density at low values of x, which is very high.
The lower plots show (I think) an error in the use of linear bins to view a non linear trend. I would expect this curve to be linear on log-log scales (from experience), and you can see the expected behavior in the tails of these plots.
If you play with drawing these curves on top of each other they look OK apart from at the beginning. However, changing the band width to 0.0001 has a radical effect on these plots, and they begin to show a different trend (look like they are being governed by a different power).
Hmmm....
x3log <- -log(x3)
plot( density(x3log,cut=0,bw=0.5), log='y',col=1)
lines(density(x3log,cut=0,bw=0.2), log='y',col=2) lines(density(x3log,cut=0,bw=0.1), log='y',col=3) lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
Sorry...
'Real' data of this form is usually discrete, with the value of 1 being the most frequent (minimum) event, and higher values occurring less frequently according to a power (power-law). This data can be easily grouped into discrete bins, and frequency plotted on log scales. The continuous data generated above requires some form of density estimation or rescaling into discreet values (make the smallest value equal to 1 and round everything else into an integer).
I see the aggregate function, but which function lets me simply count the number of values in a class (integer bin)?
The analysis of even the discretized data is made more accurate by the use of exponentially growing bins. This way you don't need to plot the data on log scales, and the increasing variance associated with lower probability events is handled by the increasing bin size (giving good accuracy of power fitting). How can I easily (ignorantly) implement exponentially increasing bin sizes?
Thanks for any feedback,
Dan.
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
