Adrian, > If it was only one column then your solution is neat. But with 5-600 > variables, each of which can contain multiple missing values, to > double this number of variables just to describe NA values seems to me > excessive. Not to mention we should be able to quickly convert / > import / export from one software package to another. This would imply > maintaining some sort of metadata reference of which explanatory > additional factor describes which original variable.
one thing *i* should keep in mind is the old saying: "The difference between theory and practice is that in theory there is no difference, but in practice, there is." but, in theory: if you have 500 columns of possibly-NA'd variables, you could have one column of 500 "bits", where each bit has one of N values, N being the number of explanations the corresponding column has for why the NA exists. i guess the CS'y thing that comes to my mind here is that one thing is the *semantics* of what you are trying to convey, and the other is how those semantics are *encoded* in whatever representation you are using. cheers, Greg ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel