[R] Is it possible to vectorize/accelerate this?
Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) (The behaviour is like cumsum's, but when the value would excess 1.0 then it has another value from a_vec.) Is it possible to make this faster? I experienced that my way is even slower than in Excel! Programming in C would my last try... Any suggestions? Than you, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
I don't immediately see a good trick for vectorization so this seems to me to be a good candidate for work in a lower-level language. Staying within R, I'd suggest you use if and else rather than ifelse() since your computation isn't vectorized: this will eliminate a small amount over overhead. Since you also always add a_vec, you could also define b_vec as a copy of a to avoid all those calls to subset a, but I don't think the effects will be large and the code might not be as clear. You indicated that you may be comfortable with writing C, but I'd suggest you look into the Rcpp/Inline package pair which make the whole process much easier than it would otherwise be. I'm not at a computer write now or I'd write a fuller example, but the documentation for those packages is uncommonly good an you should be able to easily get it down into C++. If you aren't able to get it by tomorrow, let me know and I can help troubleshoot. The only things I foresee that you'll need to change are zero-basing, C's loop syntax, and (I think) the call to abs(). (I always forget where abs() lives in c++ ) The only possible hold up is that you need to be at a computer with a C compiler Hope this helps, Michael On Nov 3, 2011, at 3:10 PM, hihi v.p.m...@freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) (The behaviour is like cumsum's, but when the value would excess 1.0 then it has another value from a_vec.) Is it possible to make this faster? I experienced that my way is even slower than in Excel! Programming in C would my last try... Any suggestions? Than you, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
Thank you, I will try as soon as possible... Regards, Peter 2011/11/3 Mark Leeds marklee...@gmail.com hi hihi: you're not using the ifelse construct correctly because it's already vectorized so there's no need to use a loop. check if below works AND if it's fast enough because I didn't check either one. Also, i bet someone else can send something better so I would wait anyway. good luck. set.seed(1) avec = rnorm(100) bvec = rep(0, 100) bvec[1]=avec[1] bveclagged - c(999,head(bvec,-1)) bvec - ifelse((abs(bveclagged + avec) 1), avec, bveclagged + avec) On Thu, Nov 3, 2011 at 7:10 PM, hihi v.p.m...@freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) (The behaviour is like cumsum's, but when the value would excess 1.0 then it has another value from a_vec.) Is it possible to make this faster? I experienced that my way is even slower than in Excel! Programming in C would my last try... Any suggestions? Than you, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
Hi: You're doing the right thing in R by pre-allocating memory for the result, but ifelse() is a vectorized function and your loop is operating elementwise, so if-else is more appropriate. Try for (i in 2:100){ b_vec[i] - if(abs(b_vec[i-1] + a_vec[i]) 1) a_vec[i] else b_vec[i-1] + a_vec[i] } If speed is an issue, then I echo Michael's suggestion to write a C(++) function and call it within R. The inline package is good for this kind of thing. HTH, Dennis On Thu, Nov 3, 2011 at 12:10 PM, hihi v.p.m...@freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) (The behaviour is like cumsum's, but when the value would excess 1.0 then it has another value from a_vec.) Is it possible to make this faster? I experienced that my way is even slower than in Excel! Programming in C would my last try... Any suggestions? Than you, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
Yes -- if else is much faster than ifelse() because if is a primitive while ifelse() is a whole function call (in fact, you can see the code by typing ifelse into the prompt and see that it has two if calls within it. Michael On Thu, Nov 3, 2011 at 4:38 PM, hihi v.p.m...@freemail.hu wrote: Hi, thank you for your very immediate response. :-) Is if than and else faster than ifelse? I'm wondering (or not knowing something) Best regards, Peter 2011/11/3 R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com I don't immediately see a good trick for vectorization so this seems to me to be a good candidate for work in a lower-level language. Staying within R, I'd suggest you use if and else rather than ifelse() since your computation isn't vectorized: this will eliminate a small amount over overhead. Since you also always add a_vec, you could also define b_vec as a copy of a to avoid all those calls to subset a, but I don't think the effects will be large and the code might not be as clear. You indicated that you may be comfortable with writing C, but I'd suggest you look into the Rcpp/Inline package pair which make the whole process much easier than it would otherwise be. I'm not at a computer write now or I'd write a fuller example, but the documentation for those packages is uncommonly good an you should be able to easily get it down into C++. If you aren't able to get it by tomorrow, let me know and I can help troubleshoot. The only things I foresee that you'll need to change are zero-basing, C's loop syntax, and (I think) the call to abs(). (I always forget where abs() lives in c++ ) The only possible hold up is that you need to be at a computer with a C compiler Hope this helps, Michael On Nov 3, 2011, at 3:10 PM, hihi v.p.m...@freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) (The behaviour is like cumsum's, but when the value would excess 1.0 then it has another value from a_vec.) Is it possible to make this faster? I experienced that my way is even slower than in Excel! Programming in C would my last try... Any suggestions? Than you, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
You should get familiar with some basic timing tools and techniques so you can investigate things like this yourself. system.time is the most basic timing tool. E.g., system.time(for(i in 1:1000)f0(a)) user system elapsed 22.920 0.000 22.932 means it took c. 23 seconds of real time to run f0(a) 1000 times. When comparing timing, it makes things easier to define a series of functions that implement the various algorithms but have the same inputs and outputs. E.g., for your problem f0 - function(a_vec) { b_vec - a_vec for (i in 2:length(b_vec)){ b_vec[i] - ifelse(abs(b_vec[i-1] + a_vec[i]) 1, a_vec[i], b_vec[i-1] + a_vec[i]) } b_vec } f1 - function(a_vec) { b_vec - a_vec for (i in 2:length(b_vec)){ b_vec[i] - if(abs(b_vec[i-1] + a_vec[i]) 1) a_vec[i] else b_vec[i-1] + a_vec[i] } b_vec } f2 - function(a_vec) { b_vec - a_vec for (i in 2:length(b_vec)){ if(abs(s - b_vec[i-1] + a_vec[i]) = 1) b_vec[i] - s } b_vec } Then run them with the same dataset: a - runif(1000, 0, .3) system.time(for(i in 1:1000)f0(a)) user system elapsed 22.920 0.000 22.932 system.time(for(i in 1:1000)f1(a)) user system elapsed 5.510 0.000 5.514 system.time(for(i in 1:1000)f2(a)) user system elapsed 4.210 0.000 4.217 (The rbenchmark package's benchmark function encapsulates this idiom.) It pays to use a dataset similar to the one you will ultimately be using, where similar depends on the context. E.g., the algorithm in f2 is relatively faster when the cumsum exceeds 1 most of the time a - runif(1000, 0, 10) system.time(for(i in 1:1000)f0(a)) user system elapsed 21.900 0.000 21.912 system.time(for(i in 1:1000)f1(a)) user system elapsed 4.610 0.000 4.609 system.time(for(i in 1:1000)f2(a)) user system elapsed 2.490 0.000 2.494 If you will be working with large datasets, you should look at how the time grows as the size of the dataset grows. If the time looks quadratic between, say, length 100 and length 200, don't waste your time testing it for length 100. For algorithms that work on data.frames (or matrices), the relative speed ofen depends on the ratio of the number of rows and the number of columns of data. Check that out. For these sorts of tests it is worthwhile to make a function to generate typical looking data of any desired size. It doesn't take too long to do this once you have the right mindset. Once you do you don't have to rely on folklore like never use loops and instead do evidence-based computing. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of R. Michael Weylandt Sent: Thursday, November 03, 2011 2:51 PM To: hihi; r-help Subject: Re: [R] Is it possible to vectorize/accelerate this? Yes -- if else is much faster than ifelse() because if is a primitive while ifelse() is a whole function call (in fact, you can see the code by typing ifelse into the prompt and see that it has two if calls within it. Michael On Thu, Nov 3, 2011 at 4:38 PM, hihi v.p.m...@freemail.hu wrote: Hi, thank you for your very immediate response. :-) Is if than and else faster than ifelse? I'm wondering (or not knowing something) Best regards, Peter 2011/11/3 R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com I don't immediately see a good trick for vectorization so this seems to me to be a good candidate for work in a lower-level language. Staying within R, I'd suggest you use if and else rather than ifelse() since your computation isn't vectorized: this will eliminate a small amount over overhead. Since you also always add a_vec, you could also define b_vec as a copy of a to avoid all those calls to subset a, but I don't think the effects will be large and the code might not be as clear. You indicated that you may be comfortable with writing C, but I'd suggest you look into the Rcpp/Inline package pair which make the whole process much easier than it would otherwise be. I'm not at a computer write now or I'd write a fuller example, but the documentation for those packages is uncommonly good an you should be able to easily get it down into C++. If you aren't able to get it by tomorrow, let me know and I can help troubleshoot. The only things I foresee that you'll need to change are zero-basing, C's loop syntax, and (I think) the call to abs(). (I always forget where abs() lives in c++ ) The only possible hold up is that you need to be at a computer with a C compiler Hope this helps, Michael On Nov 3, 2011, at 3:10 PM, hihi v.p.m...@freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent
Re: [R] Is it possible to vectorize/accelerate this?
I have to admit to not doing careful timing tests, but I often eliminate if() lines as follows (bad/good is just my preference) BAD: b[i] - if(a[i]1) a[i] else a[i-1] GOOD: b[i] - a[i]* (a[i]1) + a[i-1] * (a[i]=1) On Thu, Nov 3, 2011 at 12:10 PM, hihi v.p.mail_at_freemail.hu wrote: Dear Members, I work on a simulaton experiment but it has an bottleneck. It's quite fast because of R and vectorizing, but it has a very slow for loop. The adjacent element of a vector (in terms of index number) depends conditionally on the former value of itself. Like a simple cumulating function (eg. cumsum) but with condition. Let's show me an example: a_vec = rnorm(100) b_vec = rep(0, 100) b_vec[1]=a_vec[1] for (i in 2:100){b_vec[i]=ifelse(abs(b_vec[i-1]+a_vec[i])1, a_vec[i], b_vec[i-1]+a_vec[i])} print(b_vec) -- Sent from my Cray XK6 Pendeo-navem mei anguillae plena est. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to vectorize/accelerate this?
I neglected to give another benefit of putting your algorithms into functions: you can use the compiler package to compile them, which can give a big boost in speed. E.g., I compiled the functions f0, f1, and f2 that I defined earlier to make new functions f0_c, f1_c, and f2_c: library(compiler) f0_c - cmpfun(f0) f1_c - cmpfun(f1) f2_c - cmpfun(f2) system.time(for(i in 1:1000)f0_c(a)) # a is runif(1000, 0, 10) user system elapsed 18.620 0.000 18.649 system.time(for(i in 1:1000)f1_c(a)) user system elapsed 1.290 0.000 1.288 system.time(for(i in 1:1000)f2_c(a)) user system elapsed 0.790 0.000 0.791 Compare those times with the 23, 5.5, and 4.2 seconds for the non-compiled version. I haven't used the compiler package enough to generate any folklore on it, but it certainly helps in this simple example. (identical() shows that the output of all these functions are the same.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: William Dunlap Sent: Thursday, November 03, 2011 3:23 PM To: hihi; r-help Subject: RE: [R] Is it possible to vectorize/accelerate this? You should get familiar with some basic timing tools and techniques so you can investigate things like this yourself. system.time is the most basic timing tool. E.g., system.time(for(i in 1:1000)f0(a)) user system elapsed 22.920 0.000 22.932 means it took c. 23 seconds of real time to run f0(a) 1000 times. When comparing timing, it makes things easier to define a series of functions that implement the various algorithms but have the same inputs and outputs. E.g., for your problem f0 - function(a_vec) { b_vec - a_vec for (i in 2:length(b_vec)){ b_vec[i] - ifelse(abs(b_vec[i-1] + a_vec[i]) 1, a_vec[i], b_vec[i-1] + a_vec[i]) } b_vec } f1 - function(a_vec) { b_vec - a_vec for (i in 2:length(b_vec)){ b_vec[i] - if(abs(b_vec[i-1] + a_vec[i]) 1) a_vec[i] else b_vec[i-1] + a_vec[i] } b_vec } f2 - function(a_vec) { b_vec - a_vec for (i in 2:length(b_vec)){ if(abs(s - b_vec[i-1] + a_vec[i]) = 1) b_vec[i] - s } b_vec } Then run them with the same dataset: a - runif(1000, 0, .3) system.time(for(i in 1:1000)f0(a)) user system elapsed 22.920 0.000 22.932 system.time(for(i in 1:1000)f1(a)) user system elapsed 5.510 0.000 5.514 system.time(for(i in 1:1000)f2(a)) user system elapsed 4.210 0.000 4.217 (The rbenchmark package's benchmark function encapsulates this idiom.) It pays to use a dataset similar to the one you will ultimately be using, where similar depends on the context. E.g., the algorithm in f2 is relatively faster when the cumsum exceeds 1 most of the time a - runif(1000, 0, 10) system.time(for(i in 1:1000)f0(a)) user system elapsed 21.900 0.000 21.912 system.time(for(i in 1:1000)f1(a)) user system elapsed 4.610 0.000 4.609 system.time(for(i in 1:1000)f2(a)) user system elapsed 2.490 0.000 2.494 If you will be working with large datasets, you should look at how the time grows as the size of the dataset grows. If the time looks quadratic between, say, length 100 and length 200, don't waste your time testing it for length 100. For algorithms that work on data.frames (or matrices), the relative speed ofen depends on the ratio of the number of rows and the number of columns of data. Check that out. For these sorts of tests it is worthwhile to make a function to generate typical looking data of any desired size. It doesn't take too long to do this once you have the right mindset. Once you do you don't have to rely on folklore like never use loops and instead do evidence-based computing. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of R. Michael Weylandt Sent: Thursday, November 03, 2011 2:51 PM To: hihi; r-help Subject: Re: [R] Is it possible to vectorize/accelerate this? Yes -- if else is much faster than ifelse() because if is a primitive while ifelse() is a whole function call (in fact, you can see the code by typing ifelse into the prompt and see that it has two if calls within it. Michael On Thu, Nov 3, 2011 at 4:38 PM, hihi v.p.m...@freemail.hu wrote: Hi, thank you for your very immediate response. :-) Is if than and else faster than ifelse? I'm wondering (or not knowing something) Best regards, Peter 2011/11/3 R. Michael Weylandt michael.weyla...@gmail.com michael.weyla...@gmail.com I don't immediately see a good trick for vectorization so this seems to me to be a good candidate for work in a lower-level language. Staying within R, I'd suggest you use if and else rather than ifelse() since your computation