Re: [Rd] 1954 from NA

Greg Minshall Mon, 24 May 2021 04:12:10 -0700

Adrian,

> If it was only one column then your solution is neat. But with 5-600
> variables, each of which can contain multiple missing values, to
> double this number of variables just to describe NA values seems to me
> excessive.  Not to mention we should be able to quickly convert /
> import / export from one software package to another. This would imply
> maintaining some sort of metadata reference of which explanatory
> additional factor describes which original variable.


one thing *i* should keep in mind is the old saying: "The difference
between theory and practice is that in theory there is no difference,
but in practice, there is."

but, in theory:

if you have 500 columns of possibly-NA'd variables, you could have one
column of 500 "bits", where each bit has one of N values, N being the
number of explanations the corresponding column has for why the NA
exists.

i guess the CS'y thing that comes to my mind here is that one thing is
the *semantics* of what you are trying to convey, and the other is how
those semantics are *encoded* in whatever representation you are using.

cheers, Greg

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] 1954 from NA

Reply via email to