If you are worried about an NA in the first, then use the following: > y <- c(NA, 1, 2, NA, 4, NA) > y <- na.locf(y, na.rm = FALSE) > y [1] NA 1 2 2 4 4 > y <- na.locf(y, fromLast = TRUE) > y [1] 1 1 2 2 4 4 >
On Mon, Jan 2, 2012 at 5:07 PM, Joshua Wiley <[email protected]> wrote: > Good points, Rui. > > On Mon, Jan 2, 2012 at 12:48 PM, Rui Barradas <[email protected]> wrote: >> Hello again, >> >> I believe we are all missing something. Isn't it possible to have NAs as the >> first values of 'y'? >> And isn't it also possible to have x[1] > 3? > > Theoretically, yes, in the OPs data, maybe? If the data is a time > series (or time series like), the zoo package is not a bad environment > to be working in anyways. There are all sorts of handy functions (I > had almost recommended na.approx() which replaces NAs with a linear > interpolation) based on the OPs little example dataset. Not sure if > the +2 thing is just an attempt at interpolation though or something > more general. > >> >> Here is my point (I have changed function 'f2' to predict for such cases, >> 'f1' is rubbish) >> >> # Rui >> f3 <- function(x, y){ >> inx <- which(x > 3) >> ynx <- which(is.na(y)) >> for(i in which(inx %in% ynx)) y[ynx[i]] <- y[ynx[i]-1] + 2L >> y >> } >> >> # Jim's, as a function, 'na.rm' option added or else 'df3' would produce an >> error >> require(zoo) >> f4 <- function(x, y){ >> y <- na.locf(y, na.rm=FALSE) >> inc <- cumsum(x > 3) * 2 >> y + inc >> } >> >> df <- data.frame(x = c(1,2,3,4,5), y = c(10,20,30,NA,NA)) >> df >> df2 <- data.frame(x = c(1,2,3,4,5), y = c(10,20,NA,40,NA)) >> df2 >> df3 <- data.frame(x = c(1,2,3,4,5), y = rev(c(10,20,30,NA,NA))) >> df3 >> >> # Joshua >> f(df$x, df$y) # works >> f(df2$x, df2$y) # infinite loop >> f(df3$x, df3$y) # infinite loop >> >> # Rui >> f3(df$x, df$y) # works >> f3(df2$x, df2$y) # works as expected? >> f3(df3$x, df3$y) # works as expected? >> >> # Jim >> f4(df$x, df$y) # works >> f4(df2$x, df2$y) # works as expected? >> f4(df3$x, df3$y) # works as expected? >> >> If this makes sense, the performance tests are very much in favour of Jim's >> solution. >> >> >> # If this is what is asked for, test the performance >> # with large enough N >> N <- 1.e5 >> dftest <- data.frame(x=1:N, y=c(sample(c(rep(NA, 5), 10*1:5), N, >> replace=TRUE))) >> >> sum(is.na(dftest))/N # proportion of NAs in 'dftest' >> >> t2 <- system.time(invisible(apply(dftest, 2, f2)))[c(1, 3)] >> t3 <- system.time(invisible(f3(dftest$x, dftest$y)))[c(1, 3)] >> t4 <- system.time(invisible(f4(dftest$x, dftest$y)))[c(1, 3)] >> rbind(t2=t2, t3=t3, t4=t4, t2.t3=t2/t3, t2.t4=t2/t4, t3.t4=t3/t4) >> >> Sample output >> >> user.self elapsed >> t2 2.93000 2.95000 >> t3 0.22000 0.22000 >> t4 0.01000 0.01000 >> t2.t3 13.31818 13.40909 >> t2.t4 293.00000 295.00000 >> t3.t4 22.00000 22.00000 >> >> A factor of 300 over the initial solution or 20+ over the other loop based >> one. >> >> Downside, it needs an extra package loaded, but 'zoo' is rather common >> place. >> >> Rui Barradas >> >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Conditionally-adding-a-constant-tp4253049p4254470.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> [email protected] mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

