[R] Unexpected behavior from hist()

2013-06-13 Thread Mohamed Badawy
Hi... I'm still a beginner in R. While doing some curve-fitting with a raw data 
set of length 22,000, here is what I had:



 hist(y,col=red)

gives me the frequency histogram, 13 total rectangles, highest is near 5000.



Now

 hist(y,prob=TRUE,col=red,ylim=c(0,1.5))



gives me the density (probability?) histogram, same number f rectangles, but 
the highest rectangle is obviously higher than 1, how can this be?!!!



And it gets worse, if I use the 'breaks' option, when I add 'breaks=1000', many 
of the rectangles were higher than 2.

What am I missing here?



Thanks.


P.S. I had to post this thread via email as it got rejected as I posted it from 
Nabble, reason was Message rejected by filter rule match


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior from hist()

2013-06-13 Thread Sarah Goslee
Hi,

On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy mbad...@pm-engr.com wrote:
 Hi... I'm still a beginner in R. While doing some curve-fitting with a raw 
 data set of length 22,000, here is what I had:



 hist(y,col=red)

 gives me the frequency histogram, 13 total rectangles, highest is near 5000.


You don't provide a reproducible example, so here's some fake data:

somedata - runif(1000)


 Now

 hist(y,prob=TRUE,col=red,ylim=c(0,1.5))

 gives me the density (probability?) histogram, same number f rectangles, but 
 the highest rectangle is obviously higher than 1, how can this be?!!!

Because you misread the help. using freq=FALSE (equivalent to
prob=TRUE, which is a legacy option), you are getting:

freq: logical; if ‘TRUE’, the histogram graphic is a representation
  of frequencies, the ‘counts’ component of the result; if
  ‘FALSE’, probability densities, component ‘density’, are
  plotted (so that the histogram has a total area of one).
  Defaults to ‘TRUE’ _if and only if_ ‘breaks’ are equidistant
  (and ‘probability’ is not specified).


It sounds like what you actually want is:

somehist - hist(somedata, plot=FALSE)
somehist$counts - somehist$counts/sum(somehist$counts)
plot(somehist)

 P.S. I had to post this thread via email as it got rejected as I posted it 
 from Nabble, reason was Message rejected by filter rule match

Nabble is not the R-help mailing list. Posting via email is the
correct thing to do.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior from hist()

2013-06-13 Thread David Carlson
Density means that the AREAS of the bars add to 1, not the HEIGHTS
of the bars. You probably have intervals that are less than 1. Eg:

 set.seed(42)
 x - rpois(1000, 5)/100
 info - hist(x, prob=TRUE)
 info
$breaks
 [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
0.12 0.13

$counts
 [1]  42  88 151 177 178 131  97  70  43  14   6   2   1

$density
 [1]  4.2  8.8 15.1 17.7 17.8 13.1  9.7  7.0  4.3  1.4  0.6  0.2
0.1

$mids
 [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095
0.105 0.115
[13] 0.125

$xname
[1] x

$equidist
[1] TRUE

attr(,class)
[1] histogram
 diff(info$breaks)*info$density # Areas of each bar
 [1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014
0.006 0.002
[13] 0.001
 sum(diff(info$breaks)*info$density) # Sum of the areas
[1] 1

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: r-help-boun...@r-project.org
[mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee
Sent: Thursday, June 13, 2013 10:36 AM
To: Mohamed Badawy
Cc: r-help@r-project.org
Subject: Re: [R] Unexpected behavior from hist()

Hi,

On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy
mbad...@pm-engr.com wrote:
 Hi... I'm still a beginner in R. While doing some curve-fitting
with a raw data set of length 22,000, here is what I had:



 hist(y,col=red)

 gives me the frequency histogram, 13 total rectangles, highest is
near 5000.


You don't provide a reproducible example, so here's some fake data:

somedata - runif(1000)


 Now

 hist(y,prob=TRUE,col=red,ylim=c(0,1.5))

 gives me the density (probability?) histogram, same number f
rectangles, but the highest rectangle is obviously higher than 1,
how can this be?!!!

Because you misread the help. using freq=FALSE (equivalent to
prob=TRUE, which is a legacy option), you are getting:

freq: logical; if 'TRUE', the histogram graphic is a representation
  of frequencies, the 'counts' component of the result; if
  'FALSE', probability densities, component 'density', are
  plotted (so that the histogram has a total area of one).
  Defaults to 'TRUE' _if and only if_ 'breaks' are
equidistant
  (and 'probability' is not specified).


It sounds like what you actually want is:

somehist - hist(somedata, plot=FALSE)
somehist$counts - somehist$counts/sum(somehist$counts)
plot(somehist)

 P.S. I had to post this thread via email as it got rejected as I
posted it from Nabble, reason was Message rejected by filter rule
match

Nabble is not the R-help mailing list. Posting via email is the
correct thing to do.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unexpected behavior from hist()

2013-06-13 Thread R_noob#314159
Yep, that's it. Thanks a lot for the replies I got.
I guess the point I was struggling with (as I was curve fitting a distribution 
to sample data) is the discrete vs continuous densities.
But if one wants to model sample densities with a continuous, say normal, 
distribution then the histogram should have a total area of 1.

Best.

From: David Carlson [via R] [mailto:ml-node+s789695n466946...@n4.nabble.com]
Sent: Thursday, June 13, 2013 10:58 AM
To: Mohamed Badawy
Subject: Re: Unexpected behavior from hist()

Density means that the AREAS of the bars add to 1, not the HEIGHTS
of the bars. You probably have intervals that are less than 1. Eg:

 set.seed(42)
 x - rpois(1000, 5)/100
 info - hist(x, prob=TRUE)
 info
$breaks
 [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
0.12 0.13

$counts
 [1]  42  88 151 177 178 131  97  70  43  14   6   2   1

$density
 [1]  4.2  8.8 15.1 17.7 17.8 13.1  9.7  7.0  4.3  1.4  0.6  0.2
0.1

$mids
 [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095
0.105 0.115
[13] 0.125

$xname
[1] x

$equidist
[1] TRUE

attr(,class)
[1] histogram
 diff(info$breaks)*info$density # Areas of each bar
 [1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014
0.006 0.002
[13] 0.001
 sum(diff(info$breaks)*info$density) # Sum of the areas
[1] 1

-
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=0
[mailto:[hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=1] On 
Behalf Of Sarah Goslee
Sent: Thursday, June 13, 2013 10:36 AM
To: Mohamed Badawy
Cc: [hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=2
Subject: Re: [R] Unexpected behavior from hist()

Hi,

On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy
[hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=3 wrote:
 Hi... I'm still a beginner in R. While doing some curve-fitting
with a raw data set of length 22,000, here is what I had:



 hist(y,col=red)

 gives me the frequency histogram, 13 total rectangles, highest is
near 5000.


You don't provide a reproducible example, so here's some fake data:

somedata - runif(1000)


 Now

 hist(y,prob=TRUE,col=red,ylim=c(0,1.5))

 gives me the density (probability?) histogram, same number f
rectangles, but the highest rectangle is obviously higher than 1,
how can this be?!!!

Because you misread the help. using freq=FALSE (equivalent to
prob=TRUE, which is a legacy option), you are getting:

freq: logical; if 'TRUE', the histogram graphic is a representation
  of frequencies, the 'counts' component of the result; if
  'FALSE', probability densities, component 'density', are
  plotted (so that the histogram has a total area of one).
  Defaults to 'TRUE' _if and only if_ 'breaks' are
equidistant
  (and 'probability' is not specified).


It sounds like what you actually want is:

somehist - hist(somedata, plot=FALSE)
somehist$counts - somehist$counts/sum(somehist$counts)
plot(somehist)

 P.S. I had to post this thread via email as it got rejected as I
posted it from Nabble, reason was Message rejected by filter rule
match

Nabble is not the R-help mailing list. Posting via email is the
correct thing to do.

Sarah

--
Sarah Goslee
http://www.functionaldiversity.org

__
[hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=4 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
[hidden email]/user/SendEmail.jtp?type=nodenode=4669465i=5 mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


If you reply to this email, your message will be added to the discussion below:
http://r.789695.n4.nabble.com/Unexpected-behavior-from-hist-tp4669457p4669465.html
To unsubscribe from Unexpected behavior from hist(), click 
herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4669457code=bWJhZGF3eUBwbS1lbmdyLmNvbXw0NjY5NDU3fDEyNDIwMTc1MzA=.
NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




--
View this message in context: 
http://r.789695.n4.nabble.com/Unexpected-behavior-from-hist-tp4669457p4669468.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version