R version 3.5.1 (2018-07-02) -- "Feather Spray" Copyright (C) 2018 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit)
Hi. I have data set with day month year integers. I am creating a date column from those using lubridate. a hundred or so rows failed to parse. The problem is April and September have day = 31. paste(df1$year, df1$month, df1$day, sep = "-") ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Warning message: 129 failed to parse. As expected in tutorial #The resulting Date vector can be added to df1 as a new column called date: df1$date <- ymd(paste(df1$year, df1$month, df1$day, sep = "-"))#Same warning head(df1) sapply(df1$date,class) #"date" summary(df1$date) # Min. 1st Qu. Median Mean 3rd Qu. Max. NA's #"1977-07-16" "1984-03-12" "1990-07-22" "1990-12-15" "1997-07-29" "2002-12-31" "129" is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) # year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 I am trying to replace those with 30. I am all over the map in Google looking for a fix, but haven't found one. I am sure I have over complicated my attempts with ideas(below) from these and other sites. https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1 https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/replace https://stackoverflow.com/questions/48714625/error-in-data-frame-unused-argument The following are screwy attempts at this simple repair, ??mutate_if ??replace is_missing_date <- is.na(df1$date) View(is_missing_date) date_columns <- c("year", "month", "day") missing_dates <- df1[is_missing_date, date_columns] head(missing_dates) #year month day # 3144 2000 9 31 # 3817 2000 4 31 # 3818 2000 4 31 # 3819 2000 4 31 # 3820 2000 4 31 # 3856 2000 9 31 #So need those months with 30 days that are 31 to be 30 View(missing_dates) install.packages("dplyr") library(dplyr) View(missing_dates) # ..those were the values you're going to replace I thought this function from stackover would work, but get error when I try to add filter #https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another?noredirect=1&lq=1 df.Rep <- function(.data_Frame, .search_Columns, .search_Value, .sub_Value){ .data_Frame[, .search_Columns] <- ifelse(.data_Frame[, .search_Columns]==.search_Value,.sub_Value/.search_Value,1) * .data_Frame[, .search_Columns] return(.data_Frame) } df.Rep(missing_dates, 3, 31, 30) #--So I should be able to apply this to the complete df1 data somehow? head(df1) df.Rep(df1, filter(month == c(4,9)), 31, 30) #Error in month == c(4, 9) : comparison (1) is possible only for atomic and list types Other screwy attempts: select(df1, month, day, year) str(df1) #'data.frame': 34786 obs. of 14 variables: #To choose rows, use filter(): #mutate_if(df1, month =4,9), day = 30) filter(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day == 31) %>% tally() # 1 FALSE FALSE 31161 # 2 FALSE TRUE 576 # 3 TRUE FALSE 2981 # 4 TRUE TRUE 68 df1 %>% mutate(day=replace(day, month == c(4,9), 30)) %>% as.data.frame() View(as.list(df1, month == 4)) View(df1, month == c(4,9), day == 31) df1 %>% group_by(month == c(4,9), day == 31) %>% tally() View(df1, month == c(4,9)) # df1 %>% # group_by(month == c(4,9), day == 30) %>% I know there is a simple solution and it is driving me mad that it eludes me, despite being new to R. Thank you for any advice. WHP Confidentiality Notice This message is sent from Zelis. ...{{dropped:15}} ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.