On 2023-06-05 9:27 a.m., Martin Maechler wrote:
Ben Bolker
     on Sat, 3 Jun 2023 13:06:41 -0400 writes:

     > format(c(1:2, NA)) gives the last value as "NA" rather than
     > preserving it as NA, even if na.encode = FALSE (which does the
     > 'expected' thing for character vectors, but not numeric vectors).

     > This was already brought up in 2008 in
     > https://bugs.r-project.org/show_bug.cgi?id=12318 where Gregor Gorjanc
     > pointed out the issue. Documentation was added and the bug closed as
     > invalid. GG ended with:

     >> IMHO it would be better that na.encode argument would also have an
     > effect for numeric like vectors. Nearly any function in R returns NA
     > values and I expected the same for format, at least when na.encode=FALSE.

     > I agree!

I do too, at least "in principle", keeping in mind that
backward compatibility is also an important principle ...

Not sure if the 'na.encode' argument should matter or possibly a
new optional argument, but "in principle" I think that

   format(c(1:2, NA, 4))

should preserve is.na(.) even by default.

I would say it should preserve `is.na` *only* if na.encode = FALSE - that seems like the minimal appropriate change away from the current behaviour.


     > I encountered this in the context of printing a data frame with
     > na.print = "", which works as expected when printing the individual
     > columns but not when printing the whole data frame (because
     > print.data.frame calls format.data.frame, which calls format.default
     > ...).  Example below.

     > It's also different from what you would get if you converted to
     > character before formatting and printing:

     > print(format(as.character(c(1:2, NA)), na.encode=FALSE), na.print ="")

     > Everything about this is documented (if you look carefully enough),
     > but IMO it violates the principle of least surprise
     > https://en.wikipedia.org/wiki/Principle_of_least_astonishment , so I
     > would call it at least an 'infelicity' (sensu Bill Venables)

     > Is there any chance that this design decision could be revisited?

We'd have to hear other opinions / gut feelings.

Also, someone (not me) would ideally volunteer to run
'R CMD check <pkg>' for a few 1000 (not necessarily all) CRAN &
BioC packages with an accordingly patched version of R-devel
(I might volunteer to create such a branch, e.g., a bit before the R
  Sprint 2023 end of August).

I might be willing to do that, although it would be nice if there were a pre-existing framework (analogous to r-lib/revdepcheck) for automating it and collecting the results ...




     > cheers
     > Ben Bolker


     > ---

The following issue you are raising
may really be a *different* one, as it involves format() and
print() methods for "data.frame", i.e.,

    format.data.frame() vs
     print.data.frame()

which is quite a bit related, of course, to how 'numeric'
columns are formatted -- as you note yourself below;
I vaguely recall that the data.frame method could be an even
"harder problem" .. but I don't remember the details.

It may also be that there are no changes necessary to the
*.data.frame() methods, and only the documentation (you mention)
should be updated ...


I *think* that if format.default() were changed so that na.encode=FALSE also applied to numeric types, then data frame printing would naturally work 'right' (since print.data.frame calls format.data.frame which calls format() for the individual columns specifying encode=FALSE ...)

Martin

     > Consider

     > dd <- data.frame(f = factor(1:2), c = as.character(1:2), n =
     > as.numeric(1:2), i = 1:2)
     > dd[3,] <- rep(NA, 4)
     > print(dd, na.print = "")


     > print(dd, na.print = "")
     >   f c  n  i
     > 1 1 1  1  1
     > 2 2 2  2  2
     > 3     NA NA

     > This is in fact as documented (see below), but seems suboptimal given
     > that printing the columns separately with na.print = "" would
     > successfully print the NA entries as blank even in the numeric columns:

     > invisible(lapply(dd, print, na.print = ""))
     > [1] 1 2
     > Levels: 1 2
     > [1] "1" "2"
     > [1] 1 2
     > [1] 1 2

     > * ?print.data.frame documents that it calls format() for each column
     > before printing
     > * the code of print.data.frame() shows that it calls format.data.frame()
     > with na.encode = FALSE
     > * ?format.data.frame specifically notes that na.encode "only applies to
     > elements of character vectors, not to numerical, complex nor logical
     > β€˜NA’s, which are always encoded as β€˜"NA"’.

     > So the NA values in the numeric columns become "NA" rather than
     > remaining as NA values, and are thus printed rather than being affected
     > by the na.print argument.

     > ______________________________________________
     > R-devel@r-project.org mailing list
     > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to