Greetings, I found a bug in the cut.Date function of base R. My session info is at the bottom.
The bug is in the "months" section of the code. Consider the following: # Create vector with 365 days of year and cut into 12 months dates <- seq(as.Date("2021-1-1"),as.Date("2021-12-31"),by="day") months <- cut(dates,"months",labels=1:12) table(months) # months # 1 2 3 4 5 6 7 8 9 10 11 12 # 31 28 31 30 31 30 31 31 30 31 30 31 # cut.Date works perfectly # Extend dates vector by adding following January and cut into 13 months dates <- seq(as.Date("2021-1-1"),as.Date("2022-1-31"),by="day") months <- cut(dates,"months",labels=1:13) # Error in cut.default(unclass(x), unclass(breaks), labels = labels, right = right, : # lengths of 'breaks' and 'labels' differ months <- cut(dates,"months",labels=1:14) table(months) # months # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # 31 28 31 30 31 30 31 31 30 31 30 31 31 0 # Bug: cut.Date adds a superfluous bin # This is because February has less than 31 days # Thus, the code adjusts the end into March instead of February Inspecting the source code, I see the problem resides in lines 31 through 41, the section dealing with breaks="months": if (valid == 3L) { start$mday <- 1L start$isdst <- -1L end <- as.POSIXlt(max(x, na.rm = TRUE)) step <- if (length(by2) == 2L) as.integer(by2[1L]) else 1L 38 end <- as.POSIXlt(end + (31 * step * 86400)) end$mday <- 1L end$isdst <- -1L breaks <- as.Date(seq(start, end, breaks)) When the end is adjusted in line 38, there are instances where an unnecessary extra bin is created. This is accounted for in the breaks="quarters" case by testing at the end and removing the final break if necessary. The same method employed here fixes the bug: if (valid == 3L) { start$mday <- 1L start$isdst <- -1L maxx <- max(x, na.rm = TRUE) # Added line end <- as.POSIXlt(maxx) # Modified line step <- if (length(by2) == 2L) as.integer(by2[1L]) else 1L end <- as.POSIXlt(end + (31 * step * 86400)) end$mday <- 1L end$isdst <- -1L breaks <- as.Date(seq(start, end, breaks)) lb <- length(breaks) # Added line if (maxx < breaks[lb - 1]) # Added line breaks <- breaks[-lb] # Added line I modified cut.Date with the above code changes and stored it in a function called cut_Date(): dates <- seq(as.Date("2021-1-1"),as.Date("2022-1-31"),by="day") months <- cut_Date(dates,"months",labels=1:13) table(months) # months # 1 2 3 4 5 6 7 8 9 10 11 12 13 # 31 28 31 30 31 30 31 31 30 31 30 31 31 Thanks for your time and consideration! > sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.3 tools_4.0.3 tinytex_0.28 xfun_0.19 Many Blessings, Christopher Carbone "Follow your bliss and doors will open where you didn't know they were going to be..." -Joseph Campbell ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel