Hello all. I noticed that the default setting for breaks in the construction of histograms in hist() is “right = TRUE”.
I think “right=FALSE” would be more consistent with usual definitions of lower and upper limits for bins in applied statistics, and I suggest that you consider making it the default for hist(). For example, I generated the following frequency distribution for duration of hospitalization with a script in R specifying the cuts to be “right = FALSE” (from an exercise in Bernard Rosner’s Fundamentals of Biostatistics book). number % [0,5) 5 0.20 [5,10) 12 0.48 [10,15) 6 0.24 [15,20) 1 0.04 [20,25) 0 0.00 [25,30] 1 0.04 The actual boundaries for each bin are: 0-4, 5-9, 10-14, … and so on since the limits on the right are “open”, with the exception of the last bin. This format is in agreement with usual practice and recommendations. Actually, it is compatible with the process described by Romer in his book (“from y inclusive to y exclusive”). If I use R to generate a histogram with 6 bins, I get the following:
histogram1.pdf
Description: Adobe PDF document
… which actually presents the histogram of the frequency distribution when the “right” parameter is set as “TRUE”: number % [0,5] 9 0.36 (5,10] 9 0.36 (10,15] 5 0.20 (15,20] 1 0.04 (20,25] 0 0.00 (25,30] 1 0.04 In this case, the real limits of the bins are 0-5, 6-10, 11-15, … and so on. If I edit the histogram command adding “right = FALSE”, I can get the histogram for my original frequency distribution. Compare bins 1 and 2 in both distributions and histograms.
Histogram2.pdf
Description: Adobe PDF document
The actual choice of the argument for the “right” parameter may be a matter of choice, but I think most users of R would benefit from using bins with limits that are closed to the left and open to the right, and so having this setting as a default for hist(). I am aware I am writing from the limited perspective of my own field (epidemiology and biostatistics), but there are plenty of examples that show the need to consider changing the default. Here are just a few: https://www.statcan.gc.ca/eng/concepts/definitions/age2 https://seer.cancer.gov/stdpopulations/stdpop.19ages.html https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-hinc/hinc-01.html Thank you. José José G. Conde, MD, MPH Professor, School of Medicine Director, CentIT2 UPR Medical Sciences Campus Tel (787) 763-9401 Fax (787) 758-5206 Email: jose.con...@upr.edu URL: http://rcmi.rcm.upr.edu
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel