[Rd] RE: [R] when can we expect Prof Tierney's compiled R?

Luke Tierney Tue, 26 Apr 2005 17:51:21 -0700

For what it's worth (probably not much as these simple benchmarks are
rarely representative of real code and so need to be taken with a huge
chunk of salt) here is what happens with your examples in R 2.1.0 with
the current byte compiler.

Define your examples as functions:

    n = 1e6; iA = seq(2,n); x = double(n);
    f1 <- function(x, iA) for (i in iA) x[i] = x[i-1]
    f2 <- function(x, iA) for (i in iA) x[i-1]
    f3 <- function(x, iA) for (i in iA) 1
    f4 <- function(x, iA) for (i in iA) x[i] = 1.0
    f5 <- function(x, iA) for (i in iA) i-1
    f6 <- function(x, iA) for (i in iA) i

Make byte compiled versions:

    f1c <- cmpfun(f1)
    f2c <- cmpfun(f2)
    f3c <- cmpfun(f3)
    f4c <- cmpfun(f4)
    f5c <- cmpfun(f5)
    f6c <- cmpfun(f6)

and run them:

    > system.time(f1(x, iA))
    [1] 5.43 0.04 5.56 0.00 0.00
    > system.time(f1c(x, iA))
    [1] 1.77 0.03 1.81 0.00 0.00

    > system.time(f2(x, iA))
    [1] 1.72 0.01 1.74 0.00 0.00
    > system.time(f2c(x, iA))
    [1] 0.63 0.00 0.63 0.00 0.00

    > system.time(f3(x, iA))
    [1] 0.19 0.00 0.20 0.00 0.00
    > system.time(f3c(x, iA))
    [1] 0.14 0.00 0.15 0.00 0.00

    > system.time(f4(x, iA))
    [1] 3.78 0.03 3.82 0.00 0.00
    > system.time(f4c(x, iA))
    [1] 1.26 0.02 1.30 0.00 0.00

    > system.time(f5(x, iA))
    [1] 0.99 0.00 1.00 0.00 0.00
    > system.time(f5c(x, iA))
    [1] 0.30 0.00 0.31 0.00 0.00

    > system.time(f6(x, iA))
    [1] 0.21 0.00 0.23 0.00 0.00
    > system.time(f6c(x, iA))
    [1] 0.17 0.00 0.20 0.00 0.00

I'll let you do the arithmetic.  The byte compiler does get rid of a
fair bit of interpreter overhead (which is large in these kinds of
examples compared to most real code) but there is still considerable
room for improvement.  The byte code engine currently uses the same
internal C code for doing the actual work as the interpreter, so
improvements there would help both interpreted and byte compiled code.

Best,

luke

On Fri, 22 Apr 2005, Vadim Ogranovich wrote:

If we are on the subject of byte compilation, let me bring a couple of
examples which have been puzzling me for some time. I'd like to know a)
if the compilation will likely to improve the performance for this type
of computations, and b) at least roughly understand the reasons for the
observed numbers, specifically why x[i]<- assignment is so much slower
than x[i] extraction.

The loops below are typical in any recursive calculation where the new
value of a vector is based on its immediate neighbor say to the left.
Specifically we assign the previous value to the current element.

# this shows that the assignment x[i]<- is the bottleneck in the loop

n = 1e6; iA = seq(2,n); x = double(n); system.time(for (i in iA) x[i]

= x[i-1])
[1] 4.29 0.00 4.30 0.00 0.00

n = 1e6; iA = seq(2,n); x = double(n); system.time(for (i in iA)

x[i-1])
[1] 1.46 0.01 1.46 0.00 0.00

# the overhead of the loop itself is reasonably low, just 0.17 sec

n = 1e6; iA = seq(2,n); x = double(n); system.time(for (i in iA) 1)

[1] 0.17 0.01 0.17 0.00 0.00

# pure assignment (w/o the extraction x[i]) takes 3.09 sec. Thus x[i] as
extraction is (3.09 - 0.17)/(0.79 - 0.17) = 4.7 times faster than x[i]<-
as assignment. This looks a bit odd.

n = 1e6; iA = seq(2,n); x = double(n); system.time(for (i in iA) x[i]

= 1.0)
[1] 3.08 0.00 3.09 0.00 0.00


# this shows that just evaluation of (i-1) takes about (0.79 - 0.24) =
0.55 sec on my machine (AMD 64 bit). Looks too slow.

n = 1e6; iA = seq(2,n); x = double(n); system.time(for (i in iA) i-1)

[1] 0.79 0.00 0.79 0.00 0.00

n = 1e6; iA = seq(2,n); x = double(n); system.time(for (i in iA) i)

[1] 0.24 0.01 0.24 0.00 0.00

Thanks,
Vadim

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Luke Tierney
Sent: Friday, April 22, 2005 7:33 AM
To: Peter Dalgaard
Cc: Jason Liao; r-help@stat.math.ethz.ch
Subject: Re: [R] when can we expect Prof Tierney's compiled R?

On Wed, 20 Apr 2005, Peter Dalgaard wrote:

Luke Tierney <[EMAIL PROTECTED]> writes:

Vectorized operations in R are also as fast as compiled C (because
that is what they are :-)).  A compiler such as the one

I'm working

on will be able to make most difference for

non-vectorizable or not

very vectorizable code. It may also be able to reduce the

need for

intermediate allocations in vectorizable code, which may

have other

benefits beyond just speed improvements.
Actually, it has struck me a couple of times that these

operations are

not as fast as they could be, since they are outside the

scope of fast

BLAS routines, but "embarrassingly parallel" code could easily be
written for the relevant hardware. Even on uniprocessor

systems there

might be speedups that the C compiler cannot find (e.g. because it
cannot assume that source and destination of the operation are
distinct).


My guess is that for anything beyond basic operations we are
doing OK on uniprocessors. but it would be useful to do some
testing to be sure.  For the basic operations I suspect we
are paying a heavy price for the way we handle recycling,
both in terms of overhead as such and in terms of inhibiting
compiler optimizations. For performance it would probably be
better to code the scalar-vector, equal-length-vector, and
general cases separately, though keeping the code
maintainable may be a bit of a challenge.  Again testing on a
range of platforms and compilers would be useful.

With multiprocessors likely to become more widely available
it would be good to look into ways of factoring the
vectorized math code so we can slide in one that uses threads
when approprate.  This should dovetail nicely with
compilation to identify larger vectorized expressions that
can be parallelized as a unit; I hope to look into this a bit
this summer.

luke

--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      [EMAIL PROTECTED]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:      [EMAIL PROTECTED]
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] RE: [R] when can we expect Prof Tierney's compiled R?

Reply via email to