Hi all,
Apologies for the delay in my engaging in this thread. I was traveling
this week.
The problem that Gabor raised was caused by the patch that I submitted
to fix a problem with the referenced functions when using 'months' and
'years' as the interval. The prior versions were problematic:
https://stat.ethz.ch/pipermail/r-devel/2008-January/048004.html
The patch fixed the error, but since I used hist.Date() as the reference
model and did not note the subtle difference in cut.Date() relative to
specifying the breaks increment value, this functionality was lost when
the same modification was made to the code in cut.Date().
Roger's patch helps, but does not totally remedy the situation. One also
needs to modify the method used for specifying the max value 'end' for
the breaks in order to include the max 'x' Date value in the result.
Hence, I am attaching proposed patches against R-devel for
base:::dates.R and base:::datetime.R.
I am also attaching a patch for tests:::reg-tests-1.R to add a check for
this situation to the regression tests that were also added subsequent
to that prior set of patches that I had submitted.
If perhaps Roger and Gabor could so some testing on these patches before
they are considered for inclusion into the R-devel tree, it would be
helpful to check to see if I have missed something else here.
Thanks for raising this issue.
Regards,
Marc Schwartz
Roger D. Peng wrote:
Seems changes in r44116 force the interval to be single months (or
years) instead of whatever the user specified. I think the attached
patches correct this.
Interestingly, 'cut' and 'seq' allow for the 'breaks' specification to
be something like "3 months" but the documentation for 'hist' does not
allow for this type of specification.
-roger
Gabor Grothendieck wrote:
cut.Date and cut.POSIXt indicate that the breaks argument
can be an integer followed by a space followed by "year", etc.
but it seems the integer is ignored.
For example, I assume that breaks = "3 months" is supposed
to cut it into quarters but, in fact, it cuts it into months as if
3 had not been there.
d <- seq(Sys.Date(), length = 12, by = "month")
cut(d, "3 months")
[1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
2009-02-01
cut(as.POSIXct(d), "3 months")
[1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
2009-02-01
cut(as.POSIXlt(d), "3 months")
[1] 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01 2008-08-01
2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01 2009-02-01
Levels: 2008-03-01 2008-04-01 2008-05-01 2008-06-01 2008-07-01
2008-08-01 2008-09-01 2008-10-01 2008-11-01 2008-12-01 2009-01-01
2009-02-01
--- datesORIG.R 2008-03-20 14:25:13.000000000 -0500
+++ dates.R 2008-03-20 14:38:21.000000000 -0500
@@ -322,17 +322,19 @@
if(valid == 3) {
start$mday <- 1
end <- as.POSIXlt(max(x, na.rm = TRUE))
- end <- as.POSIXlt(end + (31 * 86400))
+ step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+ end <- as.POSIXlt(end + (31 * step * 86400))
end$mday <- 1
- breaks <- as.Date(seq(start, end, "months"))
+ breaks <- as.Date(seq(start, end, breaks))
} else if(valid == 4) {
start$mon <- 0
start$mday <- 1
end <- as.POSIXlt(max(x, na.rm = TRUE))
- end <- as.POSIXlt(end + (366 * 86400))
+ step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+ end <- as.POSIXlt(end + (366 * step * 86400))
end$mon <- 0
end$mday <- 1
- breaks <- as.Date(seq(start, end, "years"))
+ breaks <- as.Date(seq(start, end, breaks))
} else {
start <- .Internal(POSIXlt2Date(start))
if (length(by2) == 2) incr <- incr * as.integer(by2[1])
--- datetimeORIG.R 2008-03-20 14:25:20.000000000 -0500
+++ datetime.R 2008-03-20 15:25:49.000000000 -0500
@@ -727,17 +727,19 @@
if(valid == 6) {
start$mday <- 1
end <- as.POSIXlt(max(x, na.rm = TRUE))
- end <- as.POSIXlt(end + (31 * 86400))
+ step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+ end <- as.POSIXlt(end + (31 * step * 86400))
end$mday <- 1
- breaks <- seq(start, end, "months")
+ breaks <- seq(start, end, breaks)
} else if(valid == 7) {
start$mon <- 0
start$mday <- 1
end <- as.POSIXlt(max(x, na.rm = TRUE))
- end <- as.POSIXlt(end + (366 * 86400))
+ step <- ifelse(length(by2) == 2, as.integer(by2[1]), 1)
+ end <- as.POSIXlt(end + (366 * step* 86400))
end$mon <- 0
end$mday <- 1
- breaks <- seq(start, end, "years")
+ breaks <- seq(start, end, breaks)
} else {
if (length(by2) == 2) incr <- incr * as.integer(by2[1])
maxx <- max(x, na.rm = TRUE)
--- reg-tests-1ORIG.R 2008-03-20 09:18:19.000000000 -0500
+++ reg-tests-1.R 2008-03-20 15:15:56.000000000 -0500
@@ -5025,7 +5025,7 @@
## was about 0.0005 in 2.6.1 patched
-## tests of problems fixed by Marc Schwarz's patch for
+## tests of problems fixed by Marc Schwartz's patch for
## cut/hist for Dates and POSIXt
Dates <- seq(as.Date("2005/01/01"), as.Date("2009/01/01"), "day")
months <- format(Dates, format = "%m")
@@ -5036,20 +5036,32 @@
stopifnot(identical(hist(Dates, "month", plot = FALSE)$counts, mn))
# Test cut.Date() for months
stopifnot(identical(as.vector(table(cut(Dates, "month"))), mn))
+# Test cut.Date() for 3 months
+stopifnot(identical(as.vector(table(cut(Dates, "3 months"))),
+ as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
# Test hist.Date() for years
stopifnot(identical(hist(Dates, "year", plot = FALSE)$counts, ty))
# Test cut.Date() for years
stopifnot(identical(as.vector(table(cut(Dates, "years"))),ty))
+# Test cut.Date() for 3 years
+stopifnot(identical(as.vector(table(cut(Dates, "3 years"))),
+ as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
Dtimes <- as.POSIXlt(Dates)
# Test hist.POSIXt() for months
stopifnot(identical(hist(Dtimes, "month", plot = FALSE)$counts, mn))
# Test cut.POSIXt() for months
stopifnot(identical(as.vector(table(cut(Dtimes, "month"))), mn))
+# Test cut.POSIXt() for 3 months
+stopifnot(identical(as.vector(table(cut(Dtimes, "3 months"))),
+ as.integer(colSums(matrix(c(mn, 0, 0), nrow = 3)))))
# Test hist.POSIXt() for years
stopifnot(identical(hist(Dtimes, "year", plot = FALSE)$counts, ty))
# Test cut.POSIXt() for years
stopifnot(identical(as.vector(table(cut(Dtimes, "years"))), ty))
+# Test cut.POSIXt() for 3 years
+stopifnot(identical(as.vector(table(cut(Dtimes, "3 years"))),
+ as.integer(colSums(matrix(c(ty, 0), nrow = 3)))))
## changed in 2.6.2
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel