Re: [Rd] Cost of method dispatching: was: when can we expect Prof Tierney's compiled R?

2005-05-04 Thread Duncan Murdoch
Vadim Ogranovich wrote:
 


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 27, 2005 1:13 AM
To: Vadim Ogranovich
Cc: Luke Tierney; r-devel@stat.math.ethz.ch
Subject: Re: [Rd] RE: [R] when can we expect Prof Tierney's 
compiled R?

On Tue, 26 Apr 2005, Vadim Ogranovich wrote:
...
The arithmetic shows that x[i]- is still the bottleneck. I suspect 
that this is due to a very involved dispatching/search for the 
appropriate function on the C level. There might be 
significant gain 

if loops somehow cached the result of the initial 
dispatching. This is 

what you probably referred to as additional improvements in 
the engine itself.
I'd be surprised if dispatching were the issue: have you 
(C-level) profiled to find out?  Please do so: these 
statements do tend to get perpetuated as fact.

For the record, I didn't profile the dispatching, so it is only my guess
and is not verified by C-level profiling.
The guess is based on reading the code and on the following timing on R
level:
n = 1e6; iA = seq(2,n); x = double(n);
f1 - function(x, iA) for (i in iA) x[i] = c(1.0)
f2 - function(x, iA) for (i in iA) x = c(1.0)
last.gc.time = gc.time(TRUE)
system.time(f1(x, iA), gcFirst=TRUE)
[1] 3.50 0.01 3.52 0.00 0.00
print(gc.time() - last.gc.time); last.gc.time = gc.time()
[1] 1.25 0.82 1.24 0.00 0.00
system.time(f2(x, iA), gcFirst=TRUE)
[1] 0.76 0.00 0.77 0.00 0.00
print(gc.time() - last.gc.time); last.gc.time = gc.time()
[1] 0.25 0.18 0.23 0.00 0.00
f1 and f2 are identical except that the first assigns to an element in
the vector (and thus goes through the method dispatching).
Originally I had thought that the number of allocations in f1 and in f2
must be the same, the c(1.0) call. But gc.time() shows that the number
of allocations in f1 is indeed, as Prof. Ripley suggests, bigger than in
f2. It is not clear to me where these extra allocations come from and
whether they are necessary. All x[i] = c(1.0) needs to do is to create a
new vector c(1.0), which is a step common between f1 and f2, and then
copy from the vector into x[i].
However even after discounting for gc.time the assignment to x[i] seems
to be heavy.

You cannot cache the result, as [- can change the class of 
x, as could other operations done by the rhs (e.g. if it were 
x[i] - g(x, i) the function g could change its argument).

Yes, however R may try to use the last method found and only when that
fails go for the full dispatch. This should give a lot of gain in a
typical case when the vars. types do not change.
There are probably efficiency improvements available, but they need to 
be done very carefully.  For example, the default method of [- could be 
called in one step, and as a side effect create a more specific method. 
 So for the second call we should call the more specific one, but the 
default call will still be valid in the sense that the arguments match 
the signature (S4) or the class matches the name (S3), but not in the 
sense that it is the method that should be called.

Duncan Murdoch
__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Cost of method dispatching: was: when can we expect Prof Tierney's compiled R?

2005-05-04 Thread Luke Tierney
On Wed, 4 May 2005, Duncan Murdoch wrote:
Vadim Ogranovich wrote:

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Wednesday, 
April 27, 2005 1:13 AM
To: Vadim Ogranovich
Cc: Luke Tierney; r-devel@stat.math.ethz.ch
Subject: Re: [Rd] RE: [R] when can we expect Prof Tierney's compiled R?

On Tue, 26 Apr 2005, Vadim Ogranovich wrote:
...
The arithmetic shows that x[i]- is still the bottleneck. I suspect that 
this is due to a very involved dispatching/search for the appropriate 
function on the C level. There might be 
significant gain 
if loops somehow cached the result of the initial 
dispatching. This is 
what you probably referred to as additional improvements in 
the engine itself.
I'd be surprised if dispatching were the issue: have you (C-level) 
profiled to find out?  Please do so: these statements do tend to get 
perpetuated as fact.

For the record, I didn't profile the dispatching, so it is only my guess
and is not verified by C-level profiling.
The guess is based on reading the code and on the following timing on R
level:
n = 1e6; iA = seq(2,n); x = double(n);
f1 - function(x, iA) for (i in iA) x[i] = c(1.0)
f2 - function(x, iA) for (i in iA) x = c(1.0)
last.gc.time = gc.time(TRUE)
system.time(f1(x, iA), gcFirst=TRUE)
[1] 3.50 0.01 3.52 0.00 0.00
print(gc.time() - last.gc.time); last.gc.time = gc.time()
[1] 1.25 0.82 1.24 0.00 0.00
system.time(f2(x, iA), gcFirst=TRUE)
[1] 0.76 0.00 0.77 0.00 0.00
print(gc.time() - last.gc.time); last.gc.time = gc.time()
[1] 0.25 0.18 0.23 0.00 0.00
f1 and f2 are identical except that the first assigns to an element in
the vector (and thus goes through the method dispatching).
Originally I had thought that the number of allocations in f1 and in f2
must be the same, the c(1.0) call. But gc.time() shows that the number
of allocations in f1 is indeed, as Prof. Ripley suggests, bigger than in
f2. It is not clear to me where these extra allocations come from and
whether they are necessary. All x[i] = c(1.0) needs to do is to create a
new vector c(1.0), which is a step common between f1 and f2, and then
copy from the vector into x[i].
However even after discounting for gc.time the assignment to x[i] seems
to be heavy.

You cannot cache the result, as [- can change the class of x, as could 
other operations done by the rhs (e.g. if it were x[i] - g(x, i) the 
function g could change its argument).

Yes, however R may try to use the last method found and only when that
fails go for the full dispatch. This should give a lot of gain in a
typical case when the vars. types do not change.
There are probably efficiency improvements available, but they need to be 
done very carefully.  For example, the default method of [- could be called 
in one step, and as a side effect create a more specific method.  So for the 
second call we should call the more specific one, but the default call will 
still be valid in the sense that the arguments match the signature (S4) or 
the class matches the name (S3), but not in the sense that it is the method 
that should be called.

Duncan Murdoch
Let's slow down here.  In
function(x, iA) for (i in iA) x[i] = c(1.0)
there are three functions in the body, [-, [, and c.  All are
.Primitives with internal C implementations for which methods can be
written.  These implementations all look roughly like this:
if (method is available)
call the method
else { C default code }
The test of whether methods are available first looks at a bit.  If
that bit is not set there are guaranteed not to be any methods.  Only
if that bit is set does any further search for methods happen.  In
this example, and in most uses of these functions, that bit is not
set, so dispatch involves testing one bit.  This most important case
has already been optimized.  Further optimizing cases where methods
migth be available might be worth doing and will happen over time as
we learn what is necessary and feasible and where bottlenecks are.
But this sort of thing has to be done with care.  Here, however, this
just is not an issue.
The default code for [- might well merit another look--I suspect it
has not been as heavily optiized as the default code for [.  How much
difference this will make to realistic code and whether the cost of
implementation/maintenance is worth while is not clear.
On additional allocations: that is the function call mechanism of the
interpreter.  This could be done differently, but given the semantics
of argument matching and ... arguments there is a limit on what an
interpreter can realistically do.  Again, whether the cost of making
changes to the function call mechanism in terms of both implementation
cost and maintenance cost is justified is not clear.  Some of us are
likely to look at the function call overhead sometime this summer; I
would't want to bet on the outcome though.
luke
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of 

[Rd] Cost of method dispatching: was: when can we expect Prof Tierney's compiled R?

2005-05-03 Thread Vadim Ogranovich
 

 -Original Message-
 From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, April 27, 2005 1:13 AM
 To: Vadim Ogranovich
 Cc: Luke Tierney; r-devel@stat.math.ethz.ch
 Subject: Re: [Rd] RE: [R] when can we expect Prof Tierney's 
 compiled R?
 
 On Tue, 26 Apr 2005, Vadim Ogranovich wrote:
 
...
  The arithmetic shows that x[i]- is still the bottleneck. I suspect 
  that this is due to a very involved dispatching/search for the 
  appropriate function on the C level. There might be 
 significant gain 
  if loops somehow cached the result of the initial 
 dispatching. This is 
  what you probably referred to as additional improvements in 
 the engine itself.
 
 I'd be surprised if dispatching were the issue: have you 
 (C-level) profiled to find out?  Please do so: these 
 statements do tend to get perpetuated as fact.


For the record, I didn't profile the dispatching, so it is only my guess
and is not verified by C-level profiling.

The guess is based on reading the code and on the following timing on R
level:
 n = 1e6; iA = seq(2,n); x = double(n);
 f1 - function(x, iA) for (i in iA) x[i] = c(1.0)
 f2 - function(x, iA) for (i in iA) x = c(1.0)
 last.gc.time = gc.time(TRUE)
 system.time(f1(x, iA), gcFirst=TRUE)
[1] 3.50 0.01 3.52 0.00 0.00
 print(gc.time() - last.gc.time); last.gc.time = gc.time()
[1] 1.25 0.82 1.24 0.00 0.00
 system.time(f2(x, iA), gcFirst=TRUE)
[1] 0.76 0.00 0.77 0.00 0.00
 print(gc.time() - last.gc.time); last.gc.time = gc.time()
[1] 0.25 0.18 0.23 0.00 0.00

f1 and f2 are identical except that the first assigns to an element in
the vector (and thus goes through the method dispatching).

Originally I had thought that the number of allocations in f1 and in f2
must be the same, the c(1.0) call. But gc.time() shows that the number
of allocations in f1 is indeed, as Prof. Ripley suggests, bigger than in
f2. It is not clear to me where these extra allocations come from and
whether they are necessary. All x[i] = c(1.0) needs to do is to create a
new vector c(1.0), which is a step common between f1 and f2, and then
copy from the vector into x[i].

However even after discounting for gc.time the assignment to x[i] seems
to be heavy.

 
 You cannot cache the result, as [- can change the class of 
 x, as could other operations done by the rhs (e.g. if it were 
 x[i] - g(x, i) the function g could change its argument).

Yes, however R may try to use the last method found and only when that
fails go for the full dispatch. This should give a lot of gain in a
typical case when the vars. types do not change.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel