Re: [R] speeding up sum of squared differences calculation
I am using a sum of squared differences in the objective function of an optimization problem I am doing and I have managed to speed it up using the outer function versus the nested for loops, but my suspicion is that the calculation could be done even quicker. Please see the code below for a simple example. If anyone can point out a faster way I would appreciate it greatly. First, look at ?system.time ; it's intended for finding out how much compute time a command takes. You're also calculating elapsed time which is dependent on (among other things) other processes running so maybe not the most reliable thing for benchmarking. Second, outer() is fast because it avoids loops and uses the fastest vector routines R can (essentially, replicating the vector and then relying on simple subtraction of vectors). In principle calling outer() loses a little time in internal checks on what object types and function you've provided etc.; in practice that doesn't add up to much. You can check by using just the core code from outer with an added multiplication instead of the dimensioning statement: ssqdif - function(X, Y=X) { #From 'outer' without modification Y - rep(Y, rep.int(length(X), length(Y))) X - rep(X, times = ceiling(length(Y)/length(X))) #For this case: sum((X-Y)^2) #SLIGHTLY quicker than d-X-Y; sum(d*d) } system.time(ssqdif(X)) #Comparing outer() method: system.time({gg - outer(X, X, FUN=-); sum(gg*gg)}) There's little practical difference; both hover from 0.00 to 0.03 s system time. I could barely tell the difference even averaged over 100 runs; I was getting an average around 0.007 (system time) and 2.5s user time for both methods. Conclusion: hard to beat outer() for this purpose in R S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speeding up sum of squared differences calculation
Conclusion: hard to beat outer() for this purpose in R ... unless you use Ken Knoblauch's suggestion of dist() or Rccp. Nice indeed. S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speeding up sum of squared differences calculation
Actually, you don't need to use c() in the dist example, either, which should shave off a few microseconds for a function call. It was late last night ... Ken On 22-10-2013 15:08, S Ellison wrote: Conclusion: hard to beat outer() for this purpose in R ... unless you use Ken Knoblauch's suggestion of dist() or Rccp. Nice indeed. S Ellison *** This email and any attachments are confidential. Any use, copying or disclosure other than by the intended recipient is unauthorised. If you have received this message in error, please notify the sender immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com and delete this message and any copies from your computer and network. LGC Limited. Registered in England 2991879. Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK -- Kenneth Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen Lépine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speeding up sum of squared differences calculation
Thanks for the Rccp example Ken! I vaguely knew about Rccp, but I didn't realize how easy it was to use it. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kenneth Knoblauch Sent: Tuesday, October 22, 2013 9:13 AM To: S Ellison Cc: r-help@r-project.org Subject: Re: [R] speeding up sum of squared differences calculation Actually, you don't need to use c() in the dist example, either, which should shave off a few microseconds for a function call. It was late last night ... Ken On 22-10-2013 15:08, S Ellison wrote: Conclusion: hard to beat outer() for this purpose in R ... unless you use Ken Knoblauch's suggestion of dist() or Rccp. Nice indeed. S Ellison *** This email and any attachments are confidential. Any use, copying or disclosure other than by the intended recipient is unauthorised. If you have received this message in error, please notify the sender immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com and delete this message and any copies from your computer and network. LGC Limited. Registered in England 2991879. Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK -- Kenneth Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen Lépine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speeding up sum of squared differences calculation
There's little practical difference; both hover from 0.00 to 0.03 s system time. I could barely tell the difference even averaged over 100 runs; I was getting an average around 0.007 (system time) and 2.5s user time for both methods. It's almost always better to use a high precision timer, as implemented in the microbenchmark package: library(microbenchmark) ssqdif - function(X, Y=X) { #From 'outer' without modification Y - rep(Y, rep.int(length(X), length(Y))) X - rep(X, times = ceiling(length(Y)/length(X))) #For this case: sum((X-Y)^2) #SLIGHTLY quicker than d-X-Y; sum(d*d) } outerdif - function(X, Y = X) { gg - outer(X, Y, FUN=-) sum(gg*gg) } X - runif(1000) microbenchmark( ssqdif(X), outerdif(X) ) Unit: milliseconds expr min lq median uq max neval ssqdif(X) 9.035473 9.912253 14.65940 16.34044 68.30620 100 outerdif(X) 8.962955 9.647820 14.85338 17.00048 66.89351 100 Looking at the range of values you can see indeed that the performance is indeed almost identical. Hadley -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] speeding up sum of squared differences calculation
outer() and dist() are good for speed on smaller problems but they require O(length(X)^2) memory. This can slow things down or even stop the calculations for large problems. You can gain a lot of speed without the memory problem by summing things up in chunks. E.g., On a Linux box I compared fChunk - function(x) { ans - 0 for(i in seq_along(x)[-1]) { ans - ans + sum( (x[i] - x[seq_len(i-1)])^2) } 2*ans } with the original algorithm (slightly cleaned up) fOrig -function (x) { ans - 0 for (i in seq_along(x)[-1]) { for (j in seq_len(i)) { ans - ans + (x[i] - x[j])^2 } } 2 * ans } and the ones based on outer() and dist(). fOuter - function(x) sum(outer(x, x, FUN=`-`)^2) fDist - function(x) 2 * sum(dist(x)^2) for a vector of length 1. z - rnorm(10^4) t(sapply(list(fOrig=fOrig, fChunk=fChunk, fDist=fDist, fOuter=fOuter), function(f)tryCatch(system.time(f(z)), error=function(e)NA)[1:3])) user.self sys.self elapsed fOrig 90.2620.000 90.378 fChunk 0.7760.004 0.779 fDist 1.0920.276 1.370 fOuter 0.7401.068 1.855 They all give slightly different answers because the round-off error depends on the order in which sums and squares are done. Of course, the Rcpp version has no memory penalty and is faster than any of them. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of S Ellison Sent: Tuesday, October 22, 2013 6:08 AM To: r-help@r-project.org Cc: Ken Knoblauch Subject: Re: [R] speeding up sum of squared differences calculation Conclusion: hard to beat outer() for this purpose in R ... unless you use Ken Knoblauch's suggestion of dist() or Rccp. Nice indeed. S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] speeding up sum of squared differences calculation
All, I am using a sum of squared differences in the objective function of an optimization problem I am doing and I have managed to speed it up using the outer function versus the nested for loops, but my suspicion is that the calculation could be done even quicker. Please see the code below for a simple example. If anyone can point out a faster way I would appreciate it greatly. Thanks, Roger X - runif(1000) now - proc.time() ans - 0 for (i in 1:length(X)) { for (j in 1:length(X)) { ans - ans + (X[i]-X[j])*(X[i]-X[j]) } } ans speed - proc.time() - now; cat( That took , round(speed[3],1), secs.\n, sep=) now - proc.time() gg - outer(X, X, FUN=-) sum(gg*gg) speed - proc.time() - now; cat( That took , round(speed[3],1), secs.\n, sep=) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.