>>>>> William Dunlap <wdun...@tibco.com> >>>>> on Thu, 22 Dec 2016 09:08:35 -0800 writes:
> As a practical matter, 'continuous' data must be discretized, so if you > have long vectors of it you will run into this problem. > Bill Dunlap > TIBCO Software > wdunlap tibco.com Yes, it is true that on the computer and in statistics we never have continuous data in the strict sense. My point was and still is that a histogram is a wrong graphical tool to be used for visualizing a distribution on a small finite set, as e.g., the dice rolls 'itpro' has used. And yes, if (s)he used something like dice <- ceiling(6 * runif(100)) and really prefers to use hist() over (something like) plot(table(dice), lwd = 6) then an appropriate graphic would rather be hist(dice, freq=TRUE, col="orange", breaks = (31:(6*32))/32) (and the default breaks from sample size N = 100'000 is indeed relatively close to that because as we both know the number of default breaks grows (slowly) with N). For me, histograms are a (poor but easy to understand and explain) version of density estimates (where the underlying density is wrt to the lebesgue measure or simlar). Now back to large / long vectors of data: If you need to bin large vectors, you will hopefully be binning to rather 100's or 1000's of bins (because 1000 is still much smaller than "large") and then you actually have computed the data for a histogram yourself already; so I personally would again prefer not to use hist(), but to write my own "3 line" function that returns an "histogram" object which I'd call plot(.) on. So, maybe providing such a short function maybe useful, notably on the ?hist help page ? Martin Maechler, ETH Zurich > On Thu, Dec 22, 2016 at 8:19 AM, Martin Maechler <maech...@stat.math.ethz.ch >> wrote: >> >>>>> itpro <itp...@yandex.ru> >> >>>>> on Thu, 22 Dec 2016 16:17:28 +0300 writes: >> >> > Hi, everyone. >> > I stumbled upon weird histogram behaviour. >> >> > Consider this "dice emulator": >> > Step 1: Generate uniform random array x of size N. >> > Step 2: Multiply each item by six and round to next bigger integer >> to get numbers 1 to 6. >> > Step 3: Plot histogram. >> >> >> x<-runif(N) >> >> y<-ceiling(x*6) >> >> hist(y,freq=TRUE, col='orange') >> >> >> > Now what I get with N=100000 >> >> >> x<-runif(100000) >> >> y<-ceiling(x*6) >> >> hist(y,freq=TRUE, col='green') >> >> > At first glance looks OK. >> >> > Now try N=100 >> >> >> x<-runif(100) >> >> y<-ceiling(x*6) >> >> hist(y,freq=TRUE, col='red') >> >> > Now first bar is not where it should be. >> > Hmm. Look again to 100000 histogram... First bar is not where I want >> it, it's only less striking due to narrow bars. >> >> > So, first bar is always in wrong position. How do I fix it to make >> perfectly spaced bars? >> >> Don't use histograms *at all* for such discrete integer data. >> >> N <- rpois(100, 5) >> plot(table(N), lwd = 4) >> >> Histograms should be only be used for continuous data (or discrete data >> with "many" possible values). >> >> It's a pain to see them so often "misused" for data like the 'N' above. >> >> Martin Maechler, >> ETH Zurich >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.