[R] Unexpected behavior from hist()
Hi... I'm still a beginner in R. While doing some curve-fitting with a raw data set of length 22,000, here is what I had: hist(y,col=red) gives me the frequency histogram, 13 total rectangles, highest is near 5000. Now hist(y,prob=TRUE,col=red,ylim=c(0,1.5)) gives me the density (probability?) histogram, same number f rectangles, but the highest rectangle is obviously higher than 1, how can this be?!!! And it gets worse, if I use the 'breaks' option, when I add 'breaks=1000', many of the rectangles were higher than 2. What am I missing here? Thanks. P.S. I had to post this thread via email as it got rejected as I posted it from Nabble, reason was Message rejected by filter rule match [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior from hist()
Hi, On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy mbad...@pm-engr.com wrote: Hi... I'm still a beginner in R. While doing some curve-fitting with a raw data set of length 22,000, here is what I had: hist(y,col=red) gives me the frequency histogram, 13 total rectangles, highest is near 5000. You don't provide a reproducible example, so here's some fake data: somedata - runif(1000) Now hist(y,prob=TRUE,col=red,ylim=c(0,1.5)) gives me the density (probability?) histogram, same number f rectangles, but the highest rectangle is obviously higher than 1, how can this be?!!! Because you misread the help. using freq=FALSE (equivalent to prob=TRUE, which is a legacy option), you are getting: freq: logical; if ‘TRUE’, the histogram graphic is a representation of frequencies, the ‘counts’ component of the result; if ‘FALSE’, probability densities, component ‘density’, are plotted (so that the histogram has a total area of one). Defaults to ‘TRUE’ _if and only if_ ‘breaks’ are equidistant (and ‘probability’ is not specified). It sounds like what you actually want is: somehist - hist(somedata, plot=FALSE) somehist$counts - somehist$counts/sum(somehist$counts) plot(somehist) P.S. I had to post this thread via email as it got rejected as I posted it from Nabble, reason was Message rejected by filter rule match Nabble is not the R-help mailing list. Posting via email is the correct thing to do. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior from hist()
Density means that the AREAS of the bars add to 1, not the HEIGHTS of the bars. You probably have intervals that are less than 1. Eg: set.seed(42) x - rpois(1000, 5)/100 info - hist(x, prob=TRUE) info $breaks [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 $counts [1] 42 88 151 177 178 131 97 70 43 14 6 2 1 $density [1] 4.2 8.8 15.1 17.7 17.8 13.1 9.7 7.0 4.3 1.4 0.6 0.2 0.1 $mids [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095 0.105 0.115 [13] 0.125 $xname [1] x $equidist [1] TRUE attr(,class) [1] histogram diff(info$breaks)*info$density # Areas of each bar [1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014 0.006 0.002 [13] 0.001 sum(diff(info$breaks)*info$density) # Sum of the areas [1] 1 - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee Sent: Thursday, June 13, 2013 10:36 AM To: Mohamed Badawy Cc: r-help@r-project.org Subject: Re: [R] Unexpected behavior from hist() Hi, On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy mbad...@pm-engr.com wrote: Hi... I'm still a beginner in R. While doing some curve-fitting with a raw data set of length 22,000, here is what I had: hist(y,col=red) gives me the frequency histogram, 13 total rectangles, highest is near 5000. You don't provide a reproducible example, so here's some fake data: somedata - runif(1000) Now hist(y,prob=TRUE,col=red,ylim=c(0,1.5)) gives me the density (probability?) histogram, same number f rectangles, but the highest rectangle is obviously higher than 1, how can this be?!!! Because you misread the help. using freq=FALSE (equivalent to prob=TRUE, which is a legacy option), you are getting: freq: logical; if 'TRUE', the histogram graphic is a representation of frequencies, the 'counts' component of the result; if 'FALSE', probability densities, component 'density', are plotted (so that the histogram has a total area of one). Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant (and 'probability' is not specified). It sounds like what you actually want is: somehist - hist(somedata, plot=FALSE) somehist$counts - somehist$counts/sum(somehist$counts) plot(somehist) P.S. I had to post this thread via email as it got rejected as I posted it from Nabble, reason was Message rejected by filter rule match Nabble is not the R-help mailing list. Posting via email is the correct thing to do. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behavior from hist()
Yep, that's it. Thanks a lot for the replies I got. I guess the point I was struggling with (as I was curve fitting a distribution to sample data) is the discrete vs continuous densities. But if one wants to model sample densities with a continuous, say normal, distribution then the histogram should have a total area of 1. Best. From: David Carlson [via R] [mailto:ml-node+s789695n466946...@n4.nabble.com] Sent: Thursday, June 13, 2013 10:58 AM To: Mohamed Badawy Subject: Re: Unexpected behavior from hist() Density means that the AREAS of the bars add to 1, not the HEIGHTS of the bars. You probably have intervals that are less than 1. Eg: set.seed(42) x - rpois(1000, 5)/100 info - hist(x, prob=TRUE) info $breaks [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 $counts [1] 42 88 151 177 178 131 97 70 43 14 6 2 1 $density [1] 4.2 8.8 15.1 17.7 17.8 13.1 9.7 7.0 4.3 1.4 0.6 0.2 0.1 $mids [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095 0.105 0.115 [13] 0.125 $xname [1] x $equidist [1] TRUE attr(,class) [1] histogram diff(info$breaks)*info$density # Areas of each bar [1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014 0.006 0.002 [13] 0.001 sum(diff(info$breaks)*info$density) # Sum of the areas [1] 1 - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=0 [mailto:[hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=1] On Behalf Of Sarah Goslee Sent: Thursday, June 13, 2013 10:36 AM To: Mohamed Badawy Cc: [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=2 Subject: Re: [R] Unexpected behavior from hist() Hi, On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=3 wrote: Hi... I'm still a beginner in R. While doing some curve-fitting with a raw data set of length 22,000, here is what I had: hist(y,col=red) gives me the frequency histogram, 13 total rectangles, highest is near 5000. You don't provide a reproducible example, so here's some fake data: somedata - runif(1000) Now hist(y,prob=TRUE,col=red,ylim=c(0,1.5)) gives me the density (probability?) histogram, same number f rectangles, but the highest rectangle is obviously higher than 1, how can this be?!!! Because you misread the help. using freq=FALSE (equivalent to prob=TRUE, which is a legacy option), you are getting: freq: logical; if 'TRUE', the histogram graphic is a representation of frequencies, the 'counts' component of the result; if 'FALSE', probability densities, component 'density', are plotted (so that the histogram has a total area of one). Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant (and 'probability' is not specified). It sounds like what you actually want is: somehist - hist(somedata, plot=FALSE) somehist$counts - somehist$counts/sum(somehist$counts) plot(somehist) P.S. I had to post this thread via email as it got rejected as I posted it from Nabble, reason was Message rejected by filter rule match Nabble is not the R-help mailing list. Posting via email is the correct thing to do. Sarah -- Sarah Goslee http://www.functionaldiversity.org __ [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=4 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=5 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Unexpected-behavior-from-hist-tp4669457p4669465.html To unsubscribe from Unexpected behavior from hist(), click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4669457code=bWJhZGF3eUBwbS1lbmdyLmNvbXw0NjY5NDU3fDEyNDIwMTc1MzA=. NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/Unexpected-behavior-from-hist-tp4669457p4669468.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version