Dear Marco,
I compared the maximum values with what I was expecting based on a
calculation in Excel however I've just run the set of commands to calculate
the area:
out <- hist(StockReturns, probability=TRUE, plot=FALSE)
out
sum(diff(out$breaks) * out$intensities)
And it seems to have worked (its come out as 1).
Looking at it in more detail I have found a mistake in the excel sheet I had
that I was calculating in parallel (ironically to try to make sure that I
did not make any errors with R, which is new to me!). So, as expected, I was
being fantastically thick and actually had it all working about 4 hours ago
and have been trying to fix a non broken thing since then!
Many thanks for your help and sorry for wastig your time,
Alex
On 05/04/06, Marco Geraci <[EMAIL PROTECTED]> wrote:
>
> Hi,
> how did you evaluate the total area?
> Here is a simple example
>
> ###
> set.seed(100)
> x <- rnorm(100)
> x.h <- hist(x, freq=F, plot=F)
>
> > x.h
> $breaks
> [1] -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
> 2.5 3.0
>
> $counts
> [1] 3 4 9 14 22 20 13 7 5 2 1
>
> $intensities
> [1] 0.05999999 0.08000000 0.18000000 0.28000000
> 0.44000000 0.40000000
> [7] 0.26000000 0.14000000 0.10000000 0.04000000
> 0.02000000
>
> $density
> [1] 0.05999999 0.08000000 0.18000000 0.28000000
> 0.44000000 0.40000000
> [7] 0.26000000 0.14000000 0.10000000 0.04000000
> 0.02000000
>
> $mids
> [1] -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25
> 1.75 2.25 2.75
>
> $xname
> [1] "x"
>
> $equidist
> [1] TRUE
>
> attr(,"class")
> [1] "histogram"
>
> > sum(diff(x.h$breaks)*x.h$density)
> [1] 1
>
> # Also, you can verify
>
> > diff(x.h$breaks)*x.h$density*100
> [1] 2.999999 4.000000 9.000000 14.000000 22.000000
> 20.000000 13.000000
> [8] 7.000000 5.000000 2.000000 1.000000
>
> HTH
> Marco
>
>
> --- Alex Davies <[EMAIL PROTECTED]> wrote:
>
> > Dear All,
> >
> > I am a undergraduate using R for the first time. It
> > seems like an excellent
> > program and one that I look forward to using a lot
> > over the next few years,
> > but I have hit a very basic problem that I can't
> > solve.
> >
> > I want to produce a standardised histogram, i.e. one
> > where the area under
> > the graph is equal to 1. I look at the manual for
> > the histogram function and
> > find this:
> >
> > freq: logical; if 'TRUE', the histogram graphic
> > is a representation
> > of frequencies, the 'counts' component of
> > the result; if
> > 'FALSE', probability densities, component
> > 'density', are
> > plotted (so that the histogram has a total
> > area of one).
> > Defaults to 'TRUE' _iff_ 'breaks' are
> > equidistant (and
> > 'probability' is not specified).
> >
> > I therefore expect that the following command:
> >
> > > h <- hist(StockReturns, freq=FALSE)
> >
> > where StockReturns has the following data in it:
> >
> > > sourcedata$StockReturns
> > [1] -0.006983 0.111565 0.053782 0.027966
> > 0.068956 0.165424 -0.022133
> > [8] -0.001910 0.052174 0.072589 -0.023002
> > 0.000521 -0.015688 0.148459
> > [15] 0.054111 0.141044 0.096686 -0.012256
> > -0.030397 0.039365 0.021407
> > [22] -0.175750 0.053901 -0.095730 0.129717
> > 0.333333 0.061563 0.085052
> > [29] 0.072295 -0.008500 0.100000 0.020000
> > -0.199763 0.081856 0.013636
> > [36] 0.007812 0.038647 -0.026945 0.037965
> > -0.079889 0.056234 -0.083333
> > [43] -0.012792 0.131711 0.015996 0.008149
> > 0.104568 0.004046 -0.027750
> > [50] 0.050802 0.045714 0.092327 -0.017857
> > 0.022574 0.083333 0.051366
> > [57] 0.004215 0.083228 0.046803 0.021335
> > 0.023797 0.094891 0.036541
> > [64] 0.016423 -0.126365 0.034219 0.098330
> > 0.079292 -0.009901 0.021559
> > [71] -0.039414 0.114286 0.101856 -0.010452
> > 0.111111 0.097274 0.104843
> > [78] 0.144439 0.021868 0.106667 0.081250
> > 0.002097 0.073302 0.087889
> > [85] -0.145165 0.014592 0.035000 0.131711
> > -0.126937 0.133989
> >
> > would result in a graph that has an area of equal to
> > 1.000. However, it does
> > not - it produces frequency densities not
> > standardized frequency densities.
> > Can someone point me in the right direction here - I
> > know I am being
> > fantastically thick but can't find out how to do
> > such a simple operation!
> >
> > My complete set of commands looks like this:
> >
> > > sourcedata <- read.table("c:/data.dat",header=T)
> > > attach(sourcedata)
> > > h <- hist(StockReturns, col='red', labels=TRUE,
> > ylab="Frequency Density",
> > probability=TRUE)
> >
> > Where c:\data.dat is a file with the numbers above
> > it, one per line, and the
> > first line containing the string "StockReturns".
> >
> > Many thanks,
> >
> > Alex Davies
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
--
Alex Davies // http://www.davz.net
This email and any files transmitted with it are confidentia...{{dropped}}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html