Hi David, Hi Bert, many thanks for the valuable discussion on NA in R (please see extract below). I follow your arguments leaving NA as they are for most of the time. In special occasions however I want to replace the NA with another value. To preserve the newly acquired knowledge for me I wrote this function:
-- cut -- t_replace_na <- function(dataset, variable, value) { if(inherits(dataset[[variable]], "factor") == TRUE) { dataset[variable] <- as.character(dataset[variable]) print(class(dataset[variable])) dataset[, variable][is.na(dataset[, variable])] <- value dataset[variable] <- as.factor(dataset[variable]) print(class(dataset[variable])) } else { dataset[, variable][is.na(dataset[, variable])] <- value } return(dataset) } ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA)) print(sapply(ds_test, class)) t_replace_na(ds_test, "a", value = -1) t_replace_na(ds_test, "b", value = -2) t_replace_na(ds_test, "c", value = -3) -- cut -- Unfortunately the if-statement does not work due to a wrong class definition within the function. When finding out what is going on I did this: -- cut -- test_class <- function(dataset, variable) { if(inherits(dataset[, variable], "factor") == TRUE) { return(c(class(dataset[variable]), TRUE)) } else { return(c(class(dataset[variable]), FALSE)) } } ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA)) print(sapply(ds_test, class)) # -- Test a -- class(ds_test[, "a"]) if(inherits(ds_test[, "a"], "factor")) { print(c(class(ds_test[, "a"]), "TRUE")) } else { print(c(class(ds_test[, "a"]), "FALSE")) } test_class(ds_test, "a") warning("'a' should be numeric NOT data.frame!") # -- Test b -- if(inherits(ds_test[, "b"], "factor")) { print(c(class(ds_test[, "b"]), "TRUE")) } else { print(c(class(ds_test[, "b"]), "FALSE")) } class(ds_test[, "b"]) test_class(ds_test, "b") warning("'b' should be logical NOT data.frame!") # -- Test c -- if(inherits(ds_test[, "c"], "factor")) { print(c(class(ds_test[, "c"]), "TRUE")) } else { print(c(class(ds_test[, "c"]), "FALSE")) } class(ds_test[, "c"]) test_class(ds_test, "c") warning("'c' should be factor NOT data.frame. In addition data.frame != factor") -- cut -- Why do I get different results for the same function if it is inside or outside my own function definition? Kind regards Georg -------------------------------- > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr > Von: "David L Carlson" <dcarl...@tamu.edu> > An: "Bert Gunter" <bgunter.4...@gmail.com> > Cc: "R Help" <r-help@r-project.org> > Betreff: Re: [R] Subscripting problem with is.na() > > Good point. I did not think about factors. Also your example raises another issue since column c is logical, but gets silently converted to numeric. This would seem to get the job done assuming the conversion is intended for numeric columns only: > > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3)) > > sapply(test, class) > a b c > "numeric" "factor" "logical" > > num <- sapply(test, is.numeric) > > test[, num][is.na(test[, num])] <- 0 > > test > a b c > 1 1 A NA > 2 0 b NA > 3 2 <NA> NA > > David C ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.