Re: [R] ggplot2 histograms... a subtle error found
When ggplot2 verifies the widths before stacking (the default position for histograms), it computes the widths from the minimum and maximum values for each bin. However, because the width of the bins (0.28) is much smaller than the scale of the edges (6.8e+09), there is some underflow and the widths don't all come out equal: # in ggplot2::collide with(data, xmax-xmin) # [1] 0.2799988 0.2799988 0.287 0.2799988 0.2799988 0.2799988 #0.287 0.2799988 0.2799988 #[10] 0.2799988 0.287 0.2799988 0.2799988 0.2799988 0.287 #0.2799988 0.2799988 0.287 #[19] 0.2799988 0.2799988 0.2799988 0.287 0.2799988 0.2799988 #0.2799988 0.287 0.2799988 #[28] 0.2799988 0.2799988 0.287 0.2799988 0.2799988 unique(with(data, xmax - xmin)) #[1] 0.2799988 0.287 So ggplot2 concludes the widths are not equal and gives the error you see. Well, what I actually check is length(widths) 1 sd(widths) 1e-6, but in this case sd(widths) is 1.35e-06, just over my threshold. I could change this, but that already seems like a fairly conservative check to me, and I don't know enough about floating point to be sure of the consequences of raising it further. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 histograms... a subtle error found
On 7/28/2010 5:04 PM, Mike Williamson wrote: Hello all, I have a peculiar and particular bug that I stumbled across with ggplot2. I cannot seem to replicate it with anything other than my specific data set. Here is the problem: - when I try to plot a histogram, allowing for ggplot2 to decide the binwidths itself, I get the following error: - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this. - Error: position_stack requires constant width My code is simply: ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram() or qplot(myDataSet$myVarOI) If I go ahead and set the binwidth to some value, then the plot can be made without problems. The problem is with the specific data that it is trying to plot. I suspect it is trying to create bins of different sizes, from the error code. Here are the basics of my data set: - length: 1936 entries - 1906 unique entries - stats: - Min. 1st Qu.Median Mean 3rd Qu. Max. 3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10 I cannot imagine this can be solved without my specifically uploading the actual data. If I simply attach it, will it be received by r-help? Hadley, if you're interested, would you like me to send you the data directly to you? I can reproduce it with generic data. The problem is one of underflow. ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + geom_histogram() #stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this. #Error during wrapup: position_stack requires constant width When ggplot2 verifies the widths before stacking (the default position for histograms), it computes the widths from the minimum and maximum values for each bin. However, because the width of the bins (0.28) is much smaller than the scale of the edges (6.8e+09), there is some underflow and the widths don't all come out equal: # in ggplot2::collide with(data, xmax-xmin) # [1] 0.2799988 0.2799988 0.287 0.2799988 0.2799988 0.2799988 #0.287 0.2799988 0.2799988 #[10] 0.2799988 0.287 0.2799988 0.2799988 0.2799988 0.287 #0.2799988 0.2799988 0.287 #[19] 0.2799988 0.2799988 0.2799988 0.287 0.2799988 0.2799988 #0.2799988 0.287 0.2799988 #[28] 0.2799988 0.2799988 0.287 0.2799988 0.2799988 unique(with(data, xmax - xmin)) #[1] 0.2799988 0.287 So ggplot2 concludes the widths are not equal and gives the error you see. I don't think this is a bug; you are operating at the edge of what the floating point precision will allow, and seem to have crossed that edge in this case. (I suppose ggplot2 could carry the information that the bins are created with equal widths and then not have to check that later, but that seems unnecessary overhead.) There is a workaround, though. ggplot(data=mtcars, aes(x=6.8e+09 + qsec)) + geom_histogram(position=identity) gives what you want and does not require the widths to be equal. If you had more than one group, position=stack and position=identity are quite different, but they are equivalent for one group and so you can get away switching one for the other in this case. Regards, Mike -- Brian Diggs Senior Research Associate, Department of Surgery, Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 histograms... a subtle error found
There is a google group dedicated to ggplot2. It might be worth making a post there: http://groups.google.com/group/ggplot2?pli=1 -- View this message in context: http://r.789695.n4.nabble.com/ggplot2-histograms-a-subtle-error-found-tp2305814p2311082.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 histograms... a subtle error found
Hello all, I have a peculiar and particular bug that I stumbled across with ggplot2. I cannot seem to replicate it with anything other than my specific data set. Here is the problem: - when I try to plot a histogram, allowing for ggplot2 to decide the binwidths itself, I get the following error: - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this. - Error: position_stack requires constant width My code is simply: ggplot(data=myDataSet, aes(x=myVarOI)) + geom_histogram() or qplot(myDataSet$myVarOI) If I go ahead and set the binwidth to some value, then the plot can be made without problems. The problem is with the specific data that it is trying to plot. I suspect it is trying to create bins of different sizes, from the error code. Here are the basics of my data set: - length: 1936 entries - 1906 unique entries - stats: - Min. 1st Qu.Median Mean 3rd Qu. Max. 3.200e+09 6.312e+09 6.591e+09 6.874e+09 7.551e+09 1.083e+10 I cannot imagine this can be solved without my specifically uploading the actual data. If I simply attach it, will it be received by r-help? Hadley, if you're interested, would you like me to send you the data directly to you? Regards, Mike Telescopes and bathyscaphes and sonar probes of Scottish lakes, Tacoma Narrows bridge collapse explained with abstract phase-space maps, Some x-ray slides, a music score, Minard's Napoleanic war: The most exciting frontier is charting what's already here. -- xkcd -- Help protect Wikipedia. Donate now: http://wikimediafoundation.org/wiki/Support_Wikipedia/en [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.