After 5 minutes more thought:

- code non-missing as missingKind = NA, not 0, so that missingKind could be a character vector, or missingKind = 0 could be supported.

- print methods should return the main argument, so mine should be

print.MultiMissing <- function(x, ...) {
  vals <- as.character(x)
  if (!is.character(x) || inherits(x, "noquote"))
    print(noquote(vals))
  else
    print(vals)
  invisible(x)
}

This still needs a lot of improvement to be a good print method, but I'll leave that to you.

Duncan Murdoch

On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
Dear Duncan,

On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <murdoch.dun...@gmail.com
<mailto:murdoch.dun...@gmail.com>> wrote:

     You've already been told how to solve this:  just add attributes to the
     objects. Use the standard NA to indicate that there is some kind of
     missingness, and the attribute to describe exactly what it is.  Stick a
     class on those objects and define methods so that subsetting and
     arithmetic preserves the extra info you've added. If you do some
     operation that turns those NAs into NaNs, big deal:  the attribute will
     still be there, and is.na <http://is.na>(NaN) still returns TRUE.


I've already tried the attributes way, it is not so easy.

If you have specific operations that are needed but that you can't get
to work, post the issue here.

In the best case scenario, it unnecessarily triples the size of the
data, but perhaps this is the only way forward.

I don't see how it could triple the size.  Surely an integer has enough
values to cover all possible kinds of missingness.  So on integer or
factor data you'd double the size, on real or character data you'd
increase it by 50%.  (This is assuming you're on a 64 bit platform with
32 bit integers and 64 bit reals and pointers.)

Here's a tiny implementation to show what I'm talking about:

asMultiMissing <- function(x) {
    if (isMultiMissing(x))
      return(x)
    missingKind <- ifelse(is.na(x), 1, 0)
    structure(x,
              missingKind = missingKind,
              class = c("MultiMissing", class(x)))
}

isMultiMissing <- function(x)
    inherits(x, "MultiMissing")

missingKind <- function(x) {
    if (isMultiMissing(x))
      attr(x, "missingKind")
    else
      ifelse(is.na(x), 1, 0)
}

`missingKind<-` <- function(x, value) {
    class(x) <- setdiff(class(x), "MultiMissing")
    x[value != 0] <- NA
    x <- asMultiMissing(x)
    attr(x, "missingKind") <- value
    x
}

`[.MultiMissing` <- function(x, i, ...) {
    missings <- missingKind(x)
    x <- NextMethod()
    missings <- missings[i]
    missingKind(x) <- missings
    x
}

print.MultiMissing <- function(x, ...) {
    vals <- as.character(x)
    if (!is.character(x) || inherits(x, "noquote"))
      print(noquote(vals))
    else
      print(vals)
}

`[<-.MultiMissing` <- function(x, i, value, ...) {
    missings <- missingKind(x)
    class(x) <- setdiff(class(x), "MultiMissing")
    x[i] <- value
    missings[i] <- missingKind(value)
    missingKind(x) <- missings
    x
}

as.character.MultiMissing <- function(x, ...) {
    missings <- missingKind(x)
    result <- NextMethod()
    ifelse(missings != 0,
           paste0("NA.", missings), result)

}

This is incomplete.  It doesn't do printing very well, and it doesn't
handle the case of assigning a MultiMissing value to a regular vector at
all.  (I think you'd need an S4 implementation if you want to support
that.)  But it does the basics:

  > x <- 1:10
  > missingKind(x)[4] <- 23
  > x
   [1] 1     2     3     NA.23 5     6     7     8     9
[10] 10
  > is.na(x)
   [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[10] FALSE
  > missingKind(x)
   [1]  0  0  0 23  0  0  0  0  0  0
  >

Duncan Murdoch


     Base R doesn't need anything else.

     You complained that users shouldn't need to know about attributes, and
     they won't:  you, as the author of the package that does this, will
     handle all those details.  Working in your subject area you know all
     the
     different kinds of NAs that people care about, and how they code
     them in
     input data, so you can make it all totally transparent.  If you do it
     well, someone in some other subject area with a completely different
     set
     of kinds of missingness will be able to adapt your code to their use.


But that is the whole point: the package author does not define possible
NAs (the possibilities are infinite), users do that.
The package should only provide a simple method to achieve that.


     I imagine this has all been done in one of the thousands of packages on
     CRAN, but if it hasn't been done well enough for you, do it better.


If it were, I would have found it by now...

Best wishes,
Adrian


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to