Re: [Rd] The function cummax() seems to have a bug.

2015-05-17 Thread Henrik Bengtsson
Below is some further troubleshooting on this:

From code inspection this bug happens for only:

* for integer values
* when the first element is NA_integer_ and the second is not.


Examples:

# Numeric/doubles works as expected
 cummax(c(NA_real_, 0, 1, 2, 3))
[1] NA NA NA NA NA

# It does not occur when the first value is non-NA
 cummax(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA

# When first value is NA, it is not remembered
# (because internal for loop starts with 2nd element)
 cummax(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA  0  1  2  3

The problem is not there for cummin():

 cummin(c(0L, NA_integer_, 1L, 2L, 3L))
[1]  0 NA NA NA NA
 cummin(c(NA_integer_, 0L, 1L, 2L, 3L))
[1] NA NA NA NA NA

but that is just pure luck due to the fact how NA_integer_ is
internally represented as the smallest possible 4-byte signed integer,
i.e.

LibExtern intR_NaInt;   /* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER  R_NaInt

Note the comment, which implies that code should not rely on the
assumption that NA_integer_ == NA_INTEGER == R_NaInt == INT_MIN; it
could equally well have been INT_MAX, which in case cummin()would
return the wrong result whereas cummax() wouldn't. So, cummin() makes
the same mistake ascummax(), where the for-loop skips the test for NA
of the first element, cf.
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L145-L148

The simple solution is probably to do (cf. native icumsum):

[HB-X201]{hb}: svn diff src/main/cum.c
Index: src/main/cum.c
===
--- src/main/cum.c  (revision 68378)
+++ src/main/cum.c  (working copy)
@@ -130,7 +130,7 @@
 int *ix = INTEGER(x), *is = INTEGER(s);
 int max = ix[0];
 is[0] = max;
-for (R_xlen_t i = 1 ; i  xlength(x) ; i++) {
+for (R_xlen_t i = 0 ; i  xlength(x) ; i++) {
if(ix[i] == NA_INTEGER) break;
is[i] = max = (max  ix[i]) ? max : ix[i];
 }
@@ -142,7 +142,7 @@
 int *ix = INTEGER(x), *is = INTEGER(s);
 int min = ix[0];
 is[0] = min;
-for (R_xlen_t i = 1 ; i  xlength(x) ; i++ ) {
+for (R_xlen_t i = 0 ; i  xlength(x) ; i++ ) {
if(ix[i] == NA_INTEGER) break;
is[i] = min = (min  ix[i]) ? min : ix[i];
 }

/Henrik

On Sun, May 17, 2015 at 4:13 AM, Dongcan Jiang dongcan.ji...@gmail.com wrote:
 Hi,

 The function cummax() seems to have a bug.

 x - c(NA, 0)
 storage.mode(x) - integer
 cummax(x)
 [1] NA  0

 The correct result of this case should be NA NA. The mistake in
 [https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
 the reason.

 Best Regards,
 Dongcan

 --
 Dongcan Jiang
 Team of Search Engine  Web Mining
 School of Electronic Engineering  Computer Science
 Peking University, Beijing, 100871, P.R.China

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] The function cummax() seems to have a bug.

2015-05-17 Thread Dongcan Jiang
Hi,

The function cummax() seems to have a bug.

 x - c(NA, 0)
 storage.mode(x) - integer
 cummax(x)
[1] NA  0

The correct result of this case should be NA NA. The mistake in [
https://github.com/wch/r-source/blob/trunk/src/main/cum.c#L130-L136] may be
the reason.

Best Regards,
Dongcan

-- 
Dongcan Jiang
Team of Search Engine  Web Mining
School of Electronic Engineering  Computer Science
Peking University, Beijing, 100871, P.R.China

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel