Those of you who track R development closely, will have noticed yesterday's commit of enhanced versions of nchar() and nzchar().
------------------------------------------------------------------------ r68254 | maechler | 2015-04-23 18:06:37 +0200 (Thu, 23 Apr 2015) | 1 line Changed paths: M doc/NEWS.Rd M src/library/base/R/New-Internal.R M src/library/base/R/zzz.R M src/library/base/man/nchar.Rd M src/main/character.c M src/main/names.c M tests/reg-tests-1a.R nchar(x) now gives NA for character NAs, configurably via nchar(x, keepNA=*); analogously for nzchar() ------------------------------------------------------------------------ Enhanced via the new argument 'keepNA' (a logical, i.e., TRUE/FALSE/NA), but also *not* backward compatible in the current implementation. Here's how it works [currently], showing the (input and output of the slightly abridged) example(nchar): > x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech") > x[3] <- NA; x [1] "asfef" "qwerty" NA "b" [5] "stuff.blah.yech" > nchar(x, keepNA= TRUE) # 5 6 NA 1 15 [1] 5 6 NA 1 15 > nchar(x, keepNA=FALSE) # 5 6 2 1 15 [1] 5 6 2 1 15 > stopifnot(identical(nchar(x ), nchar(x, keepNA= TRUE)), identical(nchar(x, "w"), nchar(x, keepNA=FALSE))) > The main reason for the change: it is more logical that NA_character_ in x are transformed to NA_integer_ in the result, which is what happens with 'keepNA = TRUE', which can be translated as "keep/preserve the NA's that were in x (the main argument)". If you use nchar(x, type = "words"), or its short form nchar(x, "w") you implicitly ask for 'keepNA = FALSE', because "words" is about output / formatting / etc, and there, you'd typically want nchar(c("ABC", NA), "words") to give 3 2 -- which is what happens unconditionally in R <= 3.2.0. We've found quite a few CRAN packages to "break" (R CMD check) for R-devel r68254, because I had clearly underestimated the number of places where current R code was built on assuming the "pre-R-devel" (aka "current R") semantics of nchar() and nzchar() which for R <= 3.2.0 say Value: For ‘nchar’, an integer vector giving the sizes of each element, __currently__ always ‘2’ for missing values (for ‘NA’). (my emphasis added to "currently"). As package authors, when using R-devel you may wait a day when you see problems with R-devel (that you don't see with R 3.2.0), but you should become aware of the slightly changed semantics of nchar() and nzchar(). Longer term, the change should have made R more "internally coherent", namely vectorized R functions preserving NA's by default. Martin Maechler, ETH Zurich ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel