Re: [R] ggplot2 histograms... a subtle error found

2010-08-09 Thread hadley wickham
 When ggplot2 verifies the widths before stacking (the default position for
 histograms), it computes the widths from the minimum and maximum values for
 each bin.  However, because the width of the bins (0.28) is much smaller
 than the scale of the edges (6.8e+09), there is some underflow and the
 widths don't all come out equal:

 # in ggplot2::collide
 with(data, xmax-xmin)
 # [1] 0.2799988 0.2799988 0.287 0.2799988 0.2799988 0.2799988 #0.287
 0.2799988 0.2799988
 #[10] 0.2799988 0.287 0.2799988 0.2799988 0.2799988 0.287 #0.2799988
 0.2799988 0.287
 #[19] 0.2799988 0.2799988 0.2799988 0.287 0.2799988 0.2799988 #0.2799988
 0.287 0.2799988
 #[28] 0.2799988 0.2799988 0.287 0.2799988 0.2799988

 unique(with(data, xmax - xmin))
 #[1] 0.2799988 0.287

 So ggplot2 concludes the widths are not equal and gives the error you see.

Well, what I actually check is length(widths)  1  sd(widths) 
1e-6, but in this case sd(widths) is 1.35e-06, just over my threshold.
 I could change this, but that already seems like a fairly
conservative check to me, and I don't know enough about floating point
to be sure of the consequences of raising it further.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 histograms... a subtle error found

2010-08-02 Thread Brian Diggs

On 7/28/2010 5:04 PM, Mike Williamson wrote:

Hello all,

 I have a peculiar and particular bug that I stumbled across with
ggplot2.  I cannot seem to replicate it with anything other than my specific
data set.

 Here is the problem:

- when I try to plot a histogram, allowing for ggplot2 to decide the
binwidths itself, I get the following error:
   - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
   adjust this.
   - Error: position_stack requires constant width

 My code is simply:

ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram()

or

qplot(myDataSet$myVarOI)

 If I go ahead and set the binwidth to some value, then the plot can be
made without problems.

 The problem is with the specific data that it is trying to plot.  I
suspect it is trying to create bins of different sizes, from the error
code.  Here are the basics of my data set:

- length:  1936 entries
- 1906 unique entries
- stats:
-  Min.   1st Qu.Median  Mean   3rd Qu.  Max.
3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10



 I cannot imagine this can be solved without my specifically uploading
the actual data.  If I simply attach it, will it be received by r-help?
Hadley, if you're interested, would you like me to send you the data
directly to you?


I can reproduce it with generic data.  The problem is one of underflow.

ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + geom_histogram()
#stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust 
this.

#Error during wrapup: position_stack requires constant width

When ggplot2 verifies the widths before stacking (the default position 
for histograms), it computes the widths from the minimum and maximum 
values for each bin.  However, because the width of the bins (0.28) is 
much smaller than the scale of the edges (6.8e+09), there is some 
underflow and the widths don't all come out equal:


# in ggplot2::collide
with(data, xmax-xmin)
# [1] 0.2799988 0.2799988 0.287 0.2799988 0.2799988 0.2799988 
#0.287 0.2799988 0.2799988
#[10] 0.2799988 0.287 0.2799988 0.2799988 0.2799988 0.287 
#0.2799988 0.2799988 0.287
#[19] 0.2799988 0.2799988 0.2799988 0.287 0.2799988 0.2799988 
#0.2799988 0.287 0.2799988

#[28] 0.2799988 0.2799988 0.287 0.2799988 0.2799988

unique(with(data, xmax - xmin))
#[1] 0.2799988 0.287

So ggplot2 concludes the widths are not equal and gives the error you 
see.  I don't think this is a bug; you are operating at the edge of what 
the floating point precision will allow, and seem to have crossed that 
edge in this case.  (I suppose ggplot2 could carry the information that 
the bins are created with equal widths and then not have to check that 
later, but that seems unnecessary overhead.)


There is a workaround, though.

ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + 
geom_histogram(position=identity)


gives what you want and does not require the widths to be equal.  If you 
had more than one group, position=stack and position=identity are 
quite different, but they are equivalent for one group and so you can 
get away switching one for the other in this case.



   Regards,
  Mike


--
Brian Diggs
Senior Research Associate, Department of Surgery, Oregon Health  
Science University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 histograms... a subtle error found

2010-08-02 Thread Xu Wang

There is a google group dedicated to ggplot2. It might be worth making a post
there:

http://groups.google.com/group/ggplot2?pli=1
-- 
View this message in context: 
http://r.789695.n4.nabble.com/ggplot2-histograms-a-subtle-error-found-tp2305814p2311082.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 histograms... a subtle error found

2010-07-28 Thread Mike Williamson
Hello all,

I have a peculiar and particular bug that I stumbled across with
ggplot2.  I cannot seem to replicate it with anything other than my specific
data set.

Here is the problem:

   - when I try to plot a histogram, allowing for ggplot2 to decide the
   binwidths itself, I get the following error:
  - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
  adjust this.
  - Error: position_stack requires constant width

My code is simply:

ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram()

or

qplot(myDataSet$myVarOI)

If I go ahead and set the binwidth to some value, then the plot can be
made without problems.

The problem is with the specific data that it is trying to plot.  I
suspect it is trying to create bins of different sizes, from the error
code.  Here are the basics of my data set:

   - length:  1936 entries
   - 1906 unique entries
   - stats:
   -  Min.   1st Qu.Median  Mean   3rd Qu.  Max.
   3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10



I cannot imagine this can be solved without my specifically uploading
the actual data.  If I simply attach it, will it be received by r-help?
Hadley, if you're interested, would you like me to send you the data
directly to you?

  Regards,
 Mike





Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here.
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.