>>>>> Harvey Smith >>>>> on Wed, 1 May 2019 03:20:55 -0400 writes:
> Inside of the anyNA() function, it will use the legacy any(is.na()) code if > x is an OBJECT(). If x is a vector of POSIXct, it will be an OBJECT(), but > it is also TYPEOF(x) == REALSXP. Therefore, it will skip the faster > ITERATE_BY_REGION, which is typically 5x faster in my testing. > Is the OBJECT() condition really necessary, or could it be moved after the > switch() for the individual TYPEOF(x) ITERATE_BY_REGION calls? "necessary ?" : yes, in the following sense : When it was introduced, the idea of anyNA(.) has been that it should be equivalent (but often faster) than any(is.na(.)). As anyNA() was only introduced quite recently (*) and many (S3 and S4) classes have had is.na() methods defined for them but -- initially at least -- not an anyNA(). So to ensure the equivalence anyNA(x) === any(is.na(x)) for "all" R objects 'x', that OBJECT(.) condition had been important and necessary. Still, being the person who had added anyNA() to R, I'm naturally sympathetic to have it faster in cases such as "Date" or "POSIXct" objects. I'd find it ugly to test for these classes specifically in the C code (via the equivalent of inherits(., "POSIXct") {{ *NOT* via the really wrong class(.)[[1]] == "POSIXct" that I see in some "experts" R code, because that fails for all class extensions ! }} but that may still be an option; Yet alternatively, one *could* consider changing the API and declare that for atomic types with a class {i.e. OBJECT(.)}, and *if* there is no anyNA() method, anyNA() will use the "atomic" fast method, instead of using any(is.na(.)). This may break existing code in packages, but the maintainers of that code could solve the problems by providing anyNA(.) methods for their objects. Other opinions / ideas ? Martin Maechler ETH Zurich / R Core Team -- *) in Spring 2013, but too late for R 3.0.0; "recently", considering R's history starting with S in the early 1980's > # script to demonstrate performance difference if x is an OBJECT or not by > using unclass() > x.posixct = Sys.time() + 1:1e6 > microbenchmark::microbenchmark( > any(is.na( x.posixct )), > anyNA( x.posixct ), > anyNA( unclass(x.posixct) ), > unit='ms') > > > > static Rboolean anyNA(SEXP call, SEXP op, SEXP args, SEXP env) > { > SEXP x = CAR(args); > SEXPTYPE xT = TYPEOF(x); > Rboolean isList = (xT == VECSXP || xT == LISTSXP), recursive = FALSE; > > if (isList && length(args) > 1) recursive = asLogical(CADR(args)); > *if (OBJECT(x) || (isList && !recursive)) {* > SEXP e0 = PROTECT(lang2(install("is.na"), x)); > SEXP e = PROTECT(lang2(install("any"), e0)); > SEXP res = PROTECT(eval(e, env)); > int ans = asLogical(res); > UNPROTECT(3); > return ans == 1; // so NA answer is false. > } > > R_xlen_t i, n = xlength(x); > switch (xT) { > case REALSXP: > { > if(REAL_NO_NA(x)) > return FALSE; > ITERATE_BY_REGION(x, xD, i, nbatch, double, REAL, { > for (int k = 0; k < nbatch; k++) > if (ISNAN(xD[k])) > return TRUE; > }); > break; > } > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel