Re: [R] hist function: freq=FALSE for standardised histograms

Alex Davies Wed, 05 Apr 2006 13:26:16 -0700

Dear Marco,

I compared the maximum values with what I was expecting based on a
calculation in Excel however I've just run the set of commands to calculate
the area:


out <- hist(StockReturns, probability=TRUE, plot=FALSE)
out
sum(diff(out$breaks) * out$intensities)

And it seems to have worked (its come out as 1).

Looking at it in more detail I have found a mistake in the excel sheet I had
that I was calculating in parallel (ironically to try to make sure that I
did not make any errors with R, which is new to me!). So, as expected, I was
being fantastically thick and actually had it all working about 4 hours ago
and have been trying to fix a non broken thing since then!

Many thanks for your help and sorry for wastig your time,

Alex

On 05/04/06, Marco Geraci <[EMAIL PROTECTED]> wrote:
>
> Hi,
> how did you evaluate the total area?
> Here is a simple example
>
> ###
> set.seed(100)
> x <- rnorm(100)
> x.h <- hist(x, freq=F, plot=F)
>
> > x.h
> $breaks
> [1] -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0
> 2.5  3.0
>
> $counts
> [1]  3  4  9 14 22 20 13  7  5  2  1
>
> $intensities
> [1] 0.05999999 0.08000000 0.18000000 0.28000000
> 0.44000000 0.40000000
> [7] 0.26000000 0.14000000 0.10000000 0.04000000
> 0.02000000
>
> $density
> [1] 0.05999999 0.08000000 0.18000000 0.28000000
> 0.44000000 0.40000000
> [7] 0.26000000 0.14000000 0.10000000 0.04000000
> 0.02000000
>
> $mids
> [1] -2.25 -1.75 -1.25 -0.75 -0.25  0.25  0.75  1.25
> 1.75  2.25  2.75
>
> $xname
> [1] "x"
>
> $equidist
> [1] TRUE
>
> attr(,"class")
> [1] "histogram"
>
> > sum(diff(x.h$breaks)*x.h$density)
> [1] 1
>
> # Also, you can verify
>
> > diff(x.h$breaks)*x.h$density*100
> [1]  2.999999  4.000000  9.000000 14.000000 22.000000
> 20.000000 13.000000
> [8]  7.000000  5.000000  2.000000  1.000000
>
> HTH
> Marco
>
>
> --- Alex Davies <[EMAIL PROTECTED]> wrote:
>
> > Dear All,
> >
> > I am a undergraduate using R for the first time. It
> > seems like an excellent
> > program and one that I look forward to using a lot
> > over the next few years,
> > but I have hit a very basic problem that I can't
> > solve.
> >
> > I want to produce a standardised histogram, i.e. one
> > where the area under
> > the graph is equal to 1. I look at the manual for
> > the histogram function and
> > find this:
> >
> >     freq: logical; if 'TRUE', the histogram graphic
> > is a representation
> >           of frequencies, the 'counts' component of
> > the result; if
> >           'FALSE', probability densities, component
> > 'density', are
> >           plotted (so that the histogram has a total
> > area of one).
> >           Defaults to 'TRUE' _iff_ 'breaks' are
> > equidistant (and
> >           'probability' is not specified).
> >
> > I therefore expect that the following command:
> >
> > > h <- hist(StockReturns, freq=FALSE)
> >
> > where StockReturns has the following data in it:
> >
> > > sourcedata$StockReturns
> >  [1] -0.006983  0.111565  0.053782  0.027966
> > 0.068956  0.165424 -0.022133
> >  [8] -0.001910  0.052174  0.072589 -0.023002
> > 0.000521 -0.015688  0.148459
> > [15]  0.054111  0.141044  0.096686 -0.012256
> > -0.030397  0.039365  0.021407
> > [22] -0.175750  0.053901 -0.095730  0.129717
> > 0.333333  0.061563  0.085052
> > [29]  0.072295 -0.008500  0.100000  0.020000
> > -0.199763  0.081856  0.013636
> > [36]  0.007812  0.038647 -0.026945  0.037965
> > -0.079889  0.056234 -0.083333
> > [43] -0.012792  0.131711  0.015996  0.008149
> > 0.104568  0.004046 -0.027750
> > [50]  0.050802  0.045714  0.092327 -0.017857
> > 0.022574  0.083333  0.051366
> > [57]  0.004215  0.083228  0.046803  0.021335
> > 0.023797  0.094891  0.036541
> > [64]  0.016423 -0.126365  0.034219  0.098330
> > 0.079292 -0.009901  0.021559
> > [71] -0.039414  0.114286  0.101856 -0.010452
> > 0.111111  0.097274  0.104843
> > [78]  0.144439  0.021868  0.106667  0.081250
> > 0.002097  0.073302  0.087889
> > [85] -0.145165  0.014592  0.035000  0.131711
> > -0.126937  0.133989
> >
> > would result in a graph that has an area of equal to
> > 1.000. However, it does
> > not - it produces frequency densities not
> > standardized frequency densities.
> > Can someone point me in the right direction here - I
> > know I am being
> > fantastically thick but can't find out how to do
> > such a simple operation!
> >
> > My complete set of commands looks like this:
> >
> > > sourcedata <- read.table("c:/data.dat",header=T)
> > > attach(sourcedata)
> > > h <- hist(StockReturns, col='red', labels=TRUE,
> > ylab="Frequency Density",
> > probability=TRUE)
> >
> > Where c:\data.dat is a file with the numbers above
> > it, one per line, and the
> > first line containing the string "StockReturns".
> >
> > Many thanks,
> >
> > Alex Davies
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



--
Alex Davies // http://www.davz.net

This email and any files transmitted with it are confidentia...{{dropped}}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] hist function: freq=FALSE for standardised histograms

Reply via email to