Yep, that's it. Thanks a lot for the replies I got. I guess the point I was struggling with (as I was curve fitting a distribution to sample data) is the discrete vs continuous densities. But if one wants to model sample densities with a continuous, say normal, distribution then the histogram should have a total area of 1.
Best. From: David Carlson [via R] [mailto:ml-node+s789695n466946...@n4.nabble.com] Sent: Thursday, June 13, 2013 10:58 AM To: Mohamed Badawy Subject: Re: Unexpected behavior from hist() Density means that the AREAS of the bars add to 1, not the HEIGHTS of the bars. You probably have intervals that are less than 1. Eg: > set.seed(42) > x <- rpois(1000, 5)/100 > info <- hist(x, prob=TRUE) > info $breaks [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 $counts [1] 42 88 151 177 178 131 97 70 43 14 6 2 1 $density [1] 4.2 8.8 15.1 17.7 17.8 13.1 9.7 7.0 4.3 1.4 0.6 0.2 0.1 $mids [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095 0.105 0.115 [13] 0.125 $xname [1] "x" $equidist [1] TRUE attr(,"class") [1] "histogram" > diff(info$breaks)*info$density # Areas of each bar [1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014 0.006 0.002 [13] 0.001 > sum(diff(info$breaks)*info$density) # Sum of the areas [1] 1 ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: [hidden email]</user/SendEmail.jtp?type=node&node=4669465&i=0> [mailto:[hidden email]</user/SendEmail.jtp?type=node&node=4669465&i=1>] On Behalf Of Sarah Goslee Sent: Thursday, June 13, 2013 10:36 AM To: Mohamed Badawy Cc: [hidden email]</user/SendEmail.jtp?type=node&node=4669465&i=2> Subject: Re: [R] Unexpected behavior from hist() Hi, On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy <[hidden email]</user/SendEmail.jtp?type=node&node=4669465&i=3>> wrote: > Hi... I'm still a beginner in R. While doing some curve-fitting with a raw data set of length 22,000, here is what I had: > > > >> hist(y,col="red") > > gives me the frequency histogram, 13 total rectangles, highest is near 5000. > You don't provide a reproducible example, so here's some fake data: somedata <- runif(1000) > Now > >> hist(y,prob=TRUE,col="red",ylim=c(0,1.5)) > > gives me the density (probability?) histogram, same number f rectangles, but the highest rectangle is obviously higher than 1, how can this be?!!! Because you misread the help. using freq=FALSE (equivalent to prob=TRUE, which is a legacy option), you are getting: freq: logical; if 'TRUE', the histogram graphic is a representation of frequencies, the 'counts' component of the result; if 'FALSE', probability densities, component 'density', are plotted (so that the histogram has a total area of one). Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant (and 'probability' is not specified). It sounds like what you actually want is: somehist <- hist(somedata, plot=FALSE) somehist$counts <- somehist$counts/sum(somehist$counts) plot(somehist) > P.S. I had to post this thread via email as it got rejected as I posted it from Nabble, reason was "Message rejected by filter rule match" Nabble is not the R-help mailing list. Posting via email is the correct thing to do. Sarah -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ [hidden email]</user/SendEmail.jtp?type=node&node=4669465&i=4> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email]</user/SendEmail.jtp?type=node&node=4669465&i=5> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ________________________________ If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Unexpected-behavior-from-hist-tp4669457p4669465.html To unsubscribe from Unexpected behavior from hist(), click here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4669457&code=bWJhZGF3eUBwbS1lbmdyLmNvbXw0NjY5NDU3fDEyNDIwMTc1MzA=>. NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://r.789695.n4.nabble.com/Unexpected-behavior-from-hist-tp4669457p4669468.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.