Hi all,

Apologies for the delay in my engaging in this thread. I was traveling this week.

The problem that Gabor raised was caused by the patch that I submitted to fix a problem with the referenced functions when using 'months' and 'years' as the interval. The prior versions were problematic:

  https://stat.ethz.ch/pipermail/r-devel/2008-January/048004.html

The patch fixed the error, but since I used hist.Date() as the reference model and did not note the subtle difference in cut.Date() relative to specifying the breaks increment value, this functionality was lost when the same modification was made to the code in cut.Date().

Roger's patch helps, but does not totally remedy the situation. One also needs to modify the method used for specifying the max value 'end' for the breaks in order to include the max 'x' Date value in the result.

Hence, I am attaching proposed patches against R-devel for base:::dates.R and base:::datetime.R.

I am also attaching a patch for tests:::reg-tests-1.R to add a check for this situation to the regression tests that were also added subsequent to that prior set of patches that I had submitted.

If perhaps Roger and Gabor could so some testing on these patches before they are considered for inclusion into the R-devel tree, it would be helpful to check to see if I have missed something else here.

Thanks for raising this issue.

Regards,

Marc Schwartz

Roger D. Peng wrote:
Seems changes in r44116 force the interval to be single months (or years) instead of whatever the user specified. I think the attached patches correct this.

Interestingly, 'cut' and 'seq' allow for the 'breaks' specification to be something like "3 months" but the documentation for 'hist' does not allow for this type of specification.

-roger

Gabor Grothendieck wrote:
cut.Date and cut.POSIXt indicate that the breaks argument
can be an integer followed by a space followed by "year", etc.
but it seems the integer is ignored.

For example, I assume that breaks = "3 months" is supposed
to cut it into quarters but, in fact, it cuts it into months as if
3 had not been there.

d <- seq(Sys.Date(), length = 12, by = "month")
cut(d, "3 months")
 [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
2009-02-01
cut(as.POSIXct(d), "3 months")
 [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
2009-02-01
cut(as.POSIXlt(d), "3 months")
 [1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
2009-02-01

--- datesORIG.R	2008-03-20 14:25:13.000000000 -0500
+++ dates.R	2008-03-20 14:38:21.000000000 -0500
@@ -322,17 +322,19 @@
 	if(valid == 3) {
         start$mday <- 1
         end <- as.POSIXlt(max(x, na.rm = TRUE))
-        end <- as.POSIXlt(end + (31 * 86400))
+        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+        end <- as.POSIXlt(end + (31 * step * 86400))
         end$mday <- 1
-        breaks <- as.Date(seq(start, end, "months"))
+        breaks <- as.Date(seq(start, end, breaks))
     } else if(valid == 4) {
         start$mon <- 0
         start$mday <- 1
         end <- as.POSIXlt(max(x, na.rm = TRUE))
-        end <- as.POSIXlt(end + (366 * 86400))
+        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+        end <- as.POSIXlt(end + (366 * step * 86400))
         end$mon <- 0
         end$mday <- 1
-        breaks <- as.Date(seq(start, end, "years"))
+        breaks <- as.Date(seq(start, end, breaks))
     } else {
         start <- .Internal(POSIXlt2Date(start))
         if (length(by2) == 2) incr <- incr * as.integer(by2[1])
--- datetimeORIG.R	2008-03-20 14:25:20.000000000 -0500
+++ datetime.R	2008-03-20 15:25:49.000000000 -0500
@@ -727,17 +727,19 @@
 	if(valid == 6) {
         start$mday <- 1
         end <- as.POSIXlt(max(x, na.rm = TRUE))
-        end <- as.POSIXlt(end + (31 * 86400))
+        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+        end <- as.POSIXlt(end + (31 * step * 86400))
         end$mday <- 1
-        breaks <- seq(start, end, "months")
+        breaks <- seq(start, end, breaks)
     } else if(valid == 7) {
         start$mon <- 0
         start$mday <- 1
         end <- as.POSIXlt(max(x, na.rm = TRUE))
-        end <- as.POSIXlt(end + (366 * 86400))
+        step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+        end <- as.POSIXlt(end + (366 * step* 86400))
         end$mon <- 0
         end$mday <- 1
-        breaks <- seq(start, end, "years")
+        breaks <- seq(start, end, breaks)
     } else {
         if (length(by2) == 2) incr <- incr * as.integer(by2[1])
 	    maxx <- max(x, na.rm = TRUE)
--- reg-tests-1ORIG.R	2008-03-20 09:18:19.000000000 -0500
+++ reg-tests-1.R	2008-03-20 15:15:56.000000000 -0500
@@ -5025,7 +5025,7 @@
 ## was about 0.0005 in 2.6.1 patched
 
 
-## tests of problems fixed by Marc Schwarz's patch for
+## tests of problems fixed by Marc Schwartz's patch for
 ## cut/hist for Dates and POSIXt
 Dates <- seq(as.Date("2005/01/01"), as.Date("2009/01/01"), "day")
 months <- format(Dates, format = "%m")
@@ -5036,20 +5036,32 @@
 stopifnot(identical(hist(Dates, "month", plot = FALSE)$counts, mn))
 # Test cut.Date() for months
 stopifnot(identical(as.vector(table(cut(Dates, "month"))), mn))
+# Test cut.Date() for 3 months
+stopifnot(identical(as.vector(table(cut(Dates, "3 months"))),
+                    as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
 # Test hist.Date() for years
 stopifnot(identical(hist(Dates, "year", plot = FALSE)$counts, ty))
 # Test cut.Date() for years
 stopifnot(identical(as.vector(table(cut(Dates, "years"))),ty))
+# Test cut.Date() for 3 years
+stopifnot(identical(as.vector(table(cut(Dates, "3 years"))),
+                    as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
 
 Dtimes <- as.POSIXlt(Dates)
 # Test hist.POSIXt() for months
 stopifnot(identical(hist(Dtimes, "month", plot = FALSE)$counts, mn))
 # Test cut.POSIXt() for months
 stopifnot(identical(as.vector(table(cut(Dtimes, "month"))), mn))
+# Test cut.POSIXt() for 3 months
+stopifnot(identical(as.vector(table(cut(Dtimes, "3 months"))),
+                    as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
 # Test hist.POSIXt() for years
 stopifnot(identical(hist(Dtimes, "year", plot = FALSE)$counts, ty))
 # Test cut.POSIXt() for years
 stopifnot(identical(as.vector(table(cut(Dtimes, "years"))), ty))
+# Test cut.POSIXt() for 3 years
+stopifnot(identical(as.vector(table(cut(Dtimes, "3 years"))),
+                    as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
 ## changed in 2.6.2
 
 
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to