Re: [R] speeding up sum of squared differences calculation

2013-10-22 Thread S Ellison
 I am using a sum of squared differences in the objective function of an
 optimization problem I am doing and I have managed to speed it up using
 the outer function versus the nested for loops, but my suspicion is
 that the calculation could be done even quicker.  Please see the code
 below for a simple example.  If anyone can point out a faster way I
 would appreciate it greatly.

First, look at ?system.time ; it's intended for finding out how much compute 
time a command takes. You're also calculating elapsed time which is dependent 
on (among other things) other processes running so maybe not the most reliable 
thing for benchmarking. 

Second, outer() is fast because it avoids loops and uses the fastest vector 
routines R can (essentially, replicating the vector and then relying on simple 
subtraction of vectors). In principle calling outer() loses a little time in 
internal checks on what object types and function you've provided etc.; in 
practice that doesn't add up to much. You can check by using just the core code 
from outer with an added multiplication instead of the dimensioning statement:

ssqdif - function(X, Y=X) {
  #From 'outer' without modification
Y - rep(Y, rep.int(length(X), length(Y)))
X - rep(X, times = ceiling(length(Y)/length(X)))
#For this case:
sum((X-Y)^2) #SLIGHTLY quicker than d-X-Y; sum(d*d)
}

system.time(ssqdif(X))

#Comparing outer() method:
system.time({gg - outer(X, X, FUN=-); sum(gg*gg)})

There's little practical difference; both hover from 0.00 to 0.03 s system 
time. I could barely tell the difference even averaged over 100 runs; I was 
getting an average around 0.007 (system time) and 2.5s user time for both 
methods.

Conclusion: hard to beat outer() for this purpose in R

S Ellison




***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up sum of squared differences calculation

2013-10-22 Thread S Ellison
 Conclusion: hard to beat outer() for this purpose in R
... unless you use Ken Knoblauch's suggestion of dist() or Rccp.

Nice indeed.

 
S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up sum of squared differences calculation

2013-10-22 Thread Kenneth Knoblauch
Actually, you don't need to use c() in the dist example, either, which 
should shave off a few microseconds for a function call.  It was late 
last night ...


Ken

On 22-10-2013 15:08, S Ellison wrote:

Conclusion: hard to beat outer() for this purpose in R

... unless you use Ken Knoblauch's suggestion of dist() or Rccp.

Nice indeed.


S Ellison


***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If
you have received this message in error, please notify the sender
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com
and delete this message and any copies from your computer and network.
LGC Limited. Registered in England 2991879.
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK


--
Kenneth Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up sum of squared differences calculation

2013-10-22 Thread Bos, Roger
Thanks for the Rccp example Ken!  I vaguely knew about Rccp, but I didn't 
realize how easy it was to use it.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Kenneth Knoblauch
Sent: Tuesday, October 22, 2013 9:13 AM
To: S Ellison
Cc: r-help@r-project.org
Subject: Re: [R] speeding up sum of squared differences calculation

Actually, you don't need to use c() in the dist example, either, which should 
shave off a few microseconds for a function call.  It was late last night ...

Ken

On 22-10-2013 15:08, S Ellison wrote:
 Conclusion: hard to beat outer() for this purpose in R
 ... unless you use Ken Knoblauch's suggestion of dist() or Rccp.
 
 Nice indeed.
 
 
 S Ellison
 
 
 ***
 This email and any attachments are confidential. Any use, copying or 
 disclosure other than by the intended recipient is unauthorised. If 
 you have received this message in error, please notify the sender 
 immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
 and delete this message and any copies from your computer and network.
 LGC Limited. Registered in England 2991879.
 Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK

--
Kenneth Knoblauch
Inserm U846
Stem-cell and Brain Research Institute
Department of Integrative Neurosciences
18 avenue du Doyen Lépine
69500 Bron
France
tel: +33 (0)4 72 91 34 77
fax: +33 (0)4 72 91 34 61
portable: +33 (0)6 84 10 64 10
http://www.sbri.fr/members/kenneth-knoblauch.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up sum of squared differences calculation

2013-10-22 Thread Hadley Wickham
 There's little practical difference; both hover from 0.00 to 0.03 s system 
 time. I could barely tell the difference even averaged over 100 runs; I was 
 getting an average around 0.007 (system time) and 2.5s user time for both 
 methods.

It's almost always better to use a high precision timer, as
implemented in the microbenchmark package:

library(microbenchmark)

ssqdif - function(X, Y=X) {
  #From 'outer' without modification
  Y - rep(Y, rep.int(length(X), length(Y)))
  X - rep(X, times = ceiling(length(Y)/length(X)))
  #For this case:
  sum((X-Y)^2) #SLIGHTLY quicker than d-X-Y; sum(d*d)
}

outerdif - function(X, Y = X) {
  gg - outer(X, Y, FUN=-)
  sum(gg*gg)
}

X - runif(1000)

microbenchmark(
  ssqdif(X),
  outerdif(X)
)

Unit: milliseconds
expr  min   lq   median   uq  max neval
   ssqdif(X) 9.035473 9.912253 14.65940 16.34044 68.30620   100
 outerdif(X) 8.962955 9.647820 14.85338 17.00048 66.89351   100

Looking at the range of values you can see indeed that the performance
is indeed almost identical.

Hadley

-- 
Chief Scientist, RStudio
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up sum of squared differences calculation

2013-10-22 Thread William Dunlap
outer() and dist() are good for speed on smaller problems but they
require O(length(X)^2) memory.  This can slow things down or even
stop the calculations for large problems.  You can gain a lot of speed
without the memory problem by summing things up in chunks.
E.g., On a Linux box I compared
fChunk - function(x) {
ans - 0
for(i in seq_along(x)[-1]) {
ans - ans + sum( (x[i] - x[seq_len(i-1)])^2)
}
2*ans
}
with the original algorithm (slightly cleaned up)
fOrig -function (x) {
ans - 0
for (i in seq_along(x)[-1]) {
for (j in seq_len(i)) {
ans - ans + (x[i] - x[j])^2
}
}
2 * ans
}
and the ones based on outer() and dist().
   fOuter - function(x) sum(outer(x, x, FUN=`-`)^2)
   fDist - function(x) 2 * sum(dist(x)^2)
for a vector of length 1.

z - rnorm(10^4)
t(sapply(list(fOrig=fOrig, fChunk=fChunk, fDist=fDist, fOuter=fOuter),
function(f)tryCatch(system.time(f(z)), error=function(e)NA)[1:3]))
  user.self sys.self elapsed
   fOrig 90.2620.000  90.378
   fChunk 0.7760.004   0.779
   fDist  1.0920.276   1.370
   fOuter 0.7401.068   1.855

They all give slightly different answers because the round-off error depends on
the order in which sums and squares are done.

Of course, the Rcpp version has no memory penalty and is faster than any of 
them.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of S Ellison
 Sent: Tuesday, October 22, 2013 6:08 AM
 To: r-help@r-project.org
 Cc: Ken Knoblauch
 Subject: Re: [R] speeding up sum of squared differences calculation
 
  Conclusion: hard to beat outer() for this purpose in R
 ... unless you use Ken Knoblauch's suggestion of dist() or Rccp.
 
 Nice indeed.
 
 
 S Ellison
 
 
 ***
 This email and any attachments are confidential. Any use...{{dropped:8}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] speeding up sum of squared differences calculation

2013-10-21 Thread Bos, Roger
All,

I am using a sum of squared differences in the objective function of an 
optimization problem I am doing and I have managed to speed it up using the 
outer function versus the nested for loops, but my suspicion is that the 
calculation could be done even quicker.  Please see the code below for a simple 
example.  If anyone can point out a faster way I would appreciate it greatly.

Thanks,

Roger

X - runif(1000)

now - proc.time()
ans - 0
for (i in 1:length(X)) {
  for (j in 1:length(X)) {
ans - ans + (X[i]-X[j])*(X[i]-X[j])
  }
}
ans
speed - proc.time() - now; cat( That took , round(speed[3],1),  secs.\n, 
sep=)

now - proc.time()
gg - outer(X, X, FUN=-)
sum(gg*gg)
speed - proc.time() - now; cat( That took , round(speed[3],1),  secs.\n, 
sep=)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.