Re: [R] non-linear binning? power-law in R

Roger D. Peng Wed, 16 Jun 2004 07:58:53 -0700

When you specify a single number to the `br' argument of `hist' it is only a suggestion. See the help page, ?hist. If you want to use the same breakpoints everytime, you need to specify the locations explicity.

-roger

Dan Bolser wrote:

On Wed, 16 Jun 2004, Roger D. Peng wrote:

You can use hist(x, br, plot = FALSE)$counts.

-roger

OK, I made this...


y4 <- runif(100000)**4
y4log <- -log(y4)
y4bin20 <- hist(y4log,20,plot=FALSE)$counts
y4bin20log <- log(y4bin20)

plot(y4bin20log)

for(i in 2:20){
 y4 <- runif(100000)**4
 y4log <- -log(y4)
 y4bin20 <- hist(y4log,20,plot=FALSE)$counts
 y4bin20log <- log(y4bin20)
 points(y4bin20log,col=i)
}


I am seeing something very strange about 1 loop in every five or ten of
the above...

Can someone try this and let me know if every fifth run or so they see a
totally different trend? Basically looks like only 10 bins are being used
at random during the run!?

Hopefully you will also see what I am trying to do, which is to highlight
the increased variance in the tail...

How would I estimate the slope of y4bin20log? (sorry for the names).

Thanks for the help,
Dan.

Dan Bolser wrote:

First, thanks to everyone who helped me get to grips with R in (x)emacs
(I get confused easily). Special thanks to Stephen Eglen for continued
support.

My question is about non-linear binning, or density functions over
distributions governed by a power law ...

y ~ mu*x**lambda # In one of its forms # (can't find Pareto in the online help)

Looking at the following should show my problem....

x3 <- runif(10000)**3        # Probably a better (correct) way to do this

plot( density(x3,cut=0,bw=0.1))
plot( density(x3,cut=0,bw=0.01))
plot( density(x3,cut=0,bw=0.001))

plot(density(x3,cut=0,bw=0.1),  log='xy')
plot(density(x3,cut=0,bw=0.01), log='xy')
plot(density(x3,cut=0,bw=0.001),log='xy')

The upper three plots show that the bw has a big effect on the appearance
of the graph by rescaling based on the initial density at low values of x,
which is very high.

The lower plots show (I think) an error in the use of linear bins to view
a non linear trend. I would expect this curve to be linear on log-log
scales (from experience), and you can see the expected behavior in the
tails of these plots.

If you play with drawing these curves on top of each other they look OK
apart from at the beginning. However, changing the band width to 0.0001 has
a radical effect on these plots, and they begin to show a different trend
(look like they are being governed by a different power).

Hmmm....

x3log <- -log(x3)

plot( density(x3log,cut=0,bw=0.5),  log='y',col=1)

lines(density(x3log,cut=0,bw=0.2),  log='y',col=2)
lines(density(x3log,cut=0,bw=0.1),  log='y',col=3)
lines(density(x3log,cut=0,bw=0.01), log='y',col=4)

Sorry...


'Real' data of this form is usually discrete, with the value of 1 being
the most frequent (minimum) event, and higher values occurring less
frequently according to a power (power-law). This data can be easily
grouped into discrete bins, and frequency plotted on log scales. The
continuous data generated above requires some form of density estimation
or rescaling into discreet values (make the smallest value equal to 1 and
round everything else into an integer).

I see the aggregate function, but which function lets me simply count the
number of values in a class (integer bin)?

The analysis of even the discretized data is made more accurate by the use
of exponentially growing bins. This way you don't need to plot the data on
log scales, and the increasing variance associated with lower probability
events is handled by the increasing bin size (giving good accuracy of
power fitting). How can I easily (ignorantly) implement exponentially
increasing bin sizes?

Thanks for any feedback,

Dan.

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] non-linear binning? power-law in R

Reply via email to