Re: [R] LOCF - Last Observation Carried Forward

Tony Plate Mon, 17 Nov 2003 10:03:58 -0800

Here's a faster version of "most.recent". It uses "rep()" in a vectorized manner.

> # Gabor Grothendieck's function:
> most.recent.cut <- function(x)
+     as.numeric(as.vector(cut(seq(x),c(which(x),Inf),lab=which(x),right=F)))
>
> # Version that uses which() and vectorized rep()
> most.recent <- function(x) {
+     # return a vector of indices of the most recent TRUE value
+     if (!is.logical(x))
+         stop("x must be logical")
+     x.pos <- which(x)
+     if (length(x.pos)==0 || x.pos[1] != 1)
+         x.pos <- c(1, x.pos)
+     rep(x.pos, c(diff(x.pos), length(x) - x.pos[length(x.pos)] + 1))
+ }
>

> x <- sample(c(T,F),1e7,rep=T)
> system.time(most.recent.cut(x))
[1] 41.21  0.54 41.98    NA    NA
> system.time(most.recent(x))
[1] 2.67 0.08 2.78   NA   NA
>

-- Tony Plate

At Friday 10:21 PM 11/14/2003 -0500, Gabor Grothendieck wrote:

From: Tony Plate <[EMAIL PROTECTED]>: > > Here's a function that does the essential computation (written to work in > both S-plus and R). > > This looks like one of those tricky problems that do not vectorize > easily. It would be simple to write a C-program to compute this very > efficiently. But are there any more efficient solutions than ones like the > below (that are written without resort to C)? > > most.recent <- function(x) { > # return a vector of indices of the most recent TRUE value > if (!is.logical(x)) > stop("x must be logical") > x[is.na(x)] <- FALSE > # x is a logical vector > r <- rle(x) > ends <- cumsum(r$lengths) > starts <- ends - r$lengths + 1 > spec <- as.list(as.data.frame(rbind(start=starts, len=r$lengths, > value=as.numeric(r$values), prev.end=c(NA, ends[-length(ends)])))) > names(spec) <- NULL > unlist(lapply(spec, function(s) if (s[3]) seq(s[1], len=s[2]) else > rep(s[4], len=s[2])), use.names=F) > } > > > x <- c(F,T,T,F,F,F,T,F) > > most.recent(x) > [1] NA 2 3 3 3 3 7 7 > > And using it to do the fill-forward: > > > x <- c(NA,2,3,NA,4,NA,5,NA,NA,NA,6,7,8,NA) > > x[most.recent(!is.na(x))] > [1] NA 2 3 3 4 4 5 5 5 5 6 7 8 8 > > > > Some timings: > > > x <- sample(c(T,F),1e4,rep=T) > > system.time(most.recent(x)) > [1] 0.33 0.01 0.47 NA NA > > x <- sample(c(T,F),1e5,rep=T) > > system.time(most.recent(x)) > [1] 4.27 0.06 6.44 NA NA > > x <- sample(c(T,F),1e6,rep=T) > > system.time(most.recent(x)) > [1] 47.27 0.17 47.97 NA NA > > > > -- Tony Plate > > PS. Actually, I just found a solution that I had lying around that is about > 70 times as fast on random test data like the above.
I was waiting for you to post this but didn't see it so I thought
I would post mine.  This one is 13x as fast and only requires
a single line of code.
> set.seed(111)
> x <- sample(c(T,F),10000,rep=T)
> system.time(z1 <- most.recent(x))
[1] 0.92 0.02 1.68   NA   NA
> system.time(z2 <- as.numeric(as.vector(
     cut(seq(x),c(which(x),Inf),lab=which(x),right=F))))
[1] 0.07 0.00 0.12   NA   NA
> all.equal(z1,z2)
[1] TRUE
______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Tony Plate [EMAIL PROTECTED]

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] LOCF - Last Observation Carried Forward

Reply via email to