Here are a couple of documents that make much the same point (e.g. "mean value imputation is not recommended"), and discuss several alternatives.
http://nces.ed.gov/statprog/2002/appendixb3.asp http://www2.chass.ncsu.edu/garson/pa765/missing.htm I think we'd need more information on the context to provide any real advice. Another possible source of help is the Impute mailing list: http://lists.utsouthwestern.edu/mailman/listinfo/impute Cheers, James -- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand On 31/01/2006 6:20 a.m., Berton Gunter wrote: > Lots of other folks will give you the simple answer (hint: ?'[' ?is.na) > > Yours is one of those "iceberg" questions -- 2/3 hidden underwater. > > Two points: > > Point 1: Generally you **don't have to do such replacement** as most of R's > functions have a na.rm or na.action argument (unfortunately, for historical > reasons, the argument names and meanings aren't consistent) that does > basically what you want anyway. > > Point 2: Doing what you ask is probably a bad idea, as it creates mythical > degrees of freedom and biases results --> gives wrong statistical answers. > > As a general matter, handling missing values "correctly" is a difficult > statistical issue that you may want to avoid if you can (R has plenty of > packages that can deal with it, but it requires background expertise). > Honestly, I'm not sure "if you can" makes any sense here (how do you know?), > but let's just say that I think your potential for mischief is reduced if > you use R's inbuilt arguments for ignoring missings rather than imputing > them naively. > > Having said that, I believe that clustering procedures, for example, may not > permit this (but they have builtin missing imputation capabilities of their > own, do they not?), so you may have to impute. In this case, try to do so > wisely (e.g. via multiple imputation?). > > Perhaps this will stimulate real experts to offer you some advice. Good > luck. > > Cheers, > Bert > > Bert Gunter > Genentech > >> -----Original Message----- >> From: [EMAIL PROTECTED] >> [mailto:[EMAIL PROTECTED] On Behalf Of Julie Bernauer >> Sent: Monday, January 30, 2006 8:50 AM >> To: [email protected] >> Subject: [R] handling NA by mean replacement >> >> Hello >> >> I am sorry fuch such a stupid question. Suppose I have a >> table of data having a >> lot of NAs and I want to replace those NAs by the mean of the >> column before NA >> replacement. How is it possible to do that efficiently ? >> >> Thanks in advance, >> >> Julie >> >> -- >> Julie Bernauer >> Yeast Structural Genomics >> http://www.genomics.eu.org >> >> ______________________________________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
