On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote: > Since you can index a matrix or dataframe with > a matrix of logicals, you can use is.na() > to index all the NA locations and replace them > all with 0 in one command. >
A quicker solution, that, IIRC, was posted to the list by Peter Dalgaard several years ago is: sapply(mydata.df, function(x) {x[is.na(x)] <- 0; x})) Some timings on a larger problem with 100 columns: > mydata.df <- as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 1000*100, replace = TRUE), nrow = 1000)) > system.time(retval <- sapply(mydata.df, function(x) {x[is.na(x)] <- 0; x})) [1] 0.108 0.008 0.120 0.000 0.000 > system.time(mydata.df[is.na(mydata.df)] <- 0) [1] 2.460 0.028 2.498 0.000 0.000 And a larger problem still, 1000 columns > mydata.df <- as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 1000*1000, replace = TRUE), nrow = 1000)) > system.time(retval <- sapply(mydata.df, function(x) {x[is.na(x)] <- 0; x})) [1] 0.908 0.068 2.657 0.000 0.000 > system.time(mydata.df[is.na(mydata.df)] <- 0) [1] 43.127 0.332 46.440 0.000 0.000 Profiling mydata.df[is.na(mydata.df)] <- 0 shows that it spends most of this time subsetting the the individual cells of the data frame in turn and setting the NA ones to 0. HTH G > > mydata.df <- as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, > > replace = TRUE), nrow = 6)) > > mydata.df > V1 V2 V3 V4 V5 > 1 1 NA 1 1 1 > 2 1 NA NA NA 1 > 3 NA NA 1 NA NA > 4 NA NA NA NA 1 > 5 NA 1 NA NA 1 > 6 1 NA NA 1 1 > > is.na(mydata.df) > V1 V2 V3 V4 V5 > 1 FALSE TRUE FALSE FALSE FALSE > 2 FALSE TRUE TRUE TRUE FALSE > 3 TRUE TRUE FALSE TRUE TRUE > 4 TRUE TRUE TRUE TRUE FALSE > 5 TRUE FALSE TRUE TRUE FALSE > 6 FALSE TRUE TRUE FALSE FALSE > > mydata.df[is.na(mydata.df)] <- 0 > > mydata.df > V1 V2 V3 V4 V5 > 1 1 0 1 1 1 > 2 1 0 0 0 1 > 3 0 0 1 0 0 > 4 0 0 0 0 1 > 5 0 1 0 0 1 > 6 1 0 0 1 1 > > > > Steven McKinney > > Statistician > Molecular Oncology and Breast Cancer Program > British Columbia Cancer Research Centre > > email: [EMAIL PROTECTED] > > tel: 604-675-8000 x7561 > > BCCRC > Molecular Oncology > 675 West 10th Ave, Floor 4 > Vancouver B.C. > V5Z 1L3 > Canada > > > > > -----Original Message----- > From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D. > Sent: Wed 3/14/2007 5:22 PM > To: R-Help List > Subject: [R] replacing all NA's in a dataframe with zeros... > > I've seen how to replace the NA's in a single column with a data frame > > *> mydata$ncigs[is.na(mydata$ncigs)]<-0 > > *But this is just one column... I have thousands of columns (!) that I need > to do this, and I can't figure out a way, outside of the dreaded loop, do > replace all NA's in an entire data frame (all vars) without naming each var > separately. Yikes. > > I'm racking my brain on this, seems like I must be staring at the obvious, > but it eludes me. Searches have come up CLOSE, but not quite what I need.. > > Any pointers? > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC [f] +44 (0)20 7679 0565 UCL Department of Geography Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street London, UK [w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT [w] http://www.freshwaters.org.uk/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.