[Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread Sklyar, Oleg (London)
Hello everybody,

it looks like the presense of some (do know know which) S4 methods for a
given S4 class degrades the performance of xtfrm (used in 'order' in new
R-devel) by a factor of millions. This is for classes that ARE derived
from numeric directly and thus should be quite trivial to convert to
numeric.

Consider the following example:

setClass(TimeDateBase, 
representation(numeric, mode=character),
prototype(mode=posix)
)
setClass(TimeDate,
representation(TimeDateBase, tzone=character),
prototype(tzone=London)
)
x = new(TimeDate, 1220966224 + runif(1e5))

system.time({ z = order(x) })
##  system.time({ z = order(x) })
##   user  system elapsed 
##  0.048   0.000   0.048 

getClass(TimeDate)
## Class TimeDate

## Slots:

## Name:  .Data tzone  mode
## Class:   numeric character character

## Extends: 
## Class TimeDateBase, directly
## Class numeric, by class TimeDateBase, distance 2
## Class vector, by class TimeDateBase, distance 3


Now, if I load a library that not only defines these same classes, but
also a bunch of methods for those, then I have the following result:

library(AHLCalendar)
x = now() + runif(1e5) ## just random times in POSIXct format
x[1:5]
## TimeDate [posix] object in 'Europe/London' of length 5:
## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672
## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721
## [5] 2008-09-09 14:19:35.657

 system.time({ z = order(x) })


Enter a frame number, or 0 to exit   

 1: system.time({
 2: order(x)
 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
 4: FUN(X[[1]], ...)
 5: xtfrm(x)
 6: xtfrm.default(x)
 7: as.vector(rank(x, ties.method = min, na.last = keep))
 8: rank(x, ties.method = min, na.last = keep)
 9: switch(ties.method, average = , min = , max =
.Internal(rank(x[!nas], ties.
10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470,
1220966375.7211
11: x[j]
12: x[j]

Selection: 0
Timing stopped at: 47.618 13.791 66.478 

At the same time:

system.time({ z = as.numeric(x) }) ## same as [EMAIL PROTECTED]
##   user  system elapsed 
##  0.001   0.000   0.001 

The only difference between the two is that I have the following methods
defined for TimeDate (full listing below). 

Any idea why this could be happenning. And yes, it is down to xtfrm
function, 'order' was just a place where the problem occured. Should
xtfrm function be smarter with respect to classes that are actually
derived from 'numeric'?

 showMethods(class=TimeDate)
Function: + (package base)
e1=TimeDate, e2=TimeDate
e1=TimeDate, e2=numeric
(inherited from: e1=TimeDateBase, e2=numeric)

Function: - (package base)
e1=TimeDate, e2=TimeDate

Function: Time (package AHLCalendar)
x=TimeDate

Function: TimeDate (package AHLCalendar)
x=TimeDate

Function: TimeDate- (package AHLCalendar)
x=TimeSeries, value=TimeDate

Function: TimeSeries (package AHLCalendar)
x=data.frame, ts=TimeDate
x=matrix, ts=TimeDate
x=numeric, ts=TimeDate

Function: [ (package base)
x=TimeDate, i=POSIXt, j=missing
x=TimeDate, i=Time, j=missing
x=TimeDate, i=TimeDate, j=missing
x=TimeDate, i=integer, j=missing
(inherited from: x=TimeDateBase, i=ANY, j=missing)
x=TimeDate, i=logical, j=missing
(inherited from: x=TimeDateBase, i=ANY, j=missing)
x=TimeSeries, i=TimeDate, j=missing
x=TimeSeries, i=TimeDate, j=vector

Function: [- (package base)
x=TimeDate, i=ANY, j=ANY, value=ANY
x=TimeDate, i=ANY, j=ANY, value=numeric
x=TimeDate, i=missing, j=ANY, value=ANY
x=TimeDate, i=missing, j=ANY, value=numeric

Function: add (package AHLCalendar)
x=TimeDate

Function: addMonths (package AHLCalendar)
x=TimeDate

Function: addYears (package AHLCalendar)
x=TimeDate

Function: align (package AHLCalendar)
x=TimeDate, to=character
x=TimeDate, to=missing

Function: as.POSIXct (package base)
x=TimeDate

Function: as.POSIXlt (package base)
x=TimeDate

Function: coerce (package methods)
from=TimeDate, to=TimeDateBase

Function: coerce- (package methods)
from=TimeDate, to=numeric

Function: dates (package AHLCalendar)
x=TimeDate

Function: format (package base)
x=TimeDate

Function: fxFwdDate (package AHLCalendar)
x=TimeDate, country=character

Function: fxSettleDate (package AHLCalendar)
x=TimeDate, country=character

Function: holidays (package AHLCalendar)
x=TimeDate

Function: index (package AHLCalendar)
x=TimeDate, y=POSIXt
x=TimeDate, y=Time
x=TimeDate, y=TimeDate

Function: initialize (package methods)
.Object=TimeDate
(inherited from: .Object=ANY)

Function: leapYear (package AHLCalendar)
x=TimeDate

Function: mday (package AHLCalendar)
x=TimeDate

Function: mode (package base)
x=TimeDate
(inherited from: x=TimeDateBase)

Function: mode- (package base)
x=TimeDate, value=character
(inherited from: x=TimeDateBase, value=character)

Function: month (package AHLCalendar)
x=TimeDate

Function: pretty (package base)
x=TimeDate

Function: prettyFormat (package AHLCalendar)
x=TimeDate, munit=character
x=TimeDate, 

Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread John Chambers
No definitive answers, but here are a few observations.

In the call to order() code, I notice that you have dropped into the branch
if (any(unlist(lapply(z, is.object
where the alternative in your case would seem to have been going 
directly to the internal code.

You can consider a method for xtfrm(), which would help but won't get 
you completely back to a trivial computation.  Alternatively,  order() 
should be eligible for the new mechanism of defining methods for 

(Individual existing methods may not be the issue, and one can't infer 
anything definite from the evidence given,  but a plausible culprit is 
the [ method.  Because [] expressions appear so often, it's always 
chancy to define a nontrivial method for this function.)

John

Sklyar, Oleg (London) wrote:
 Hello everybody,

 it looks like the presense of some (do know know which) S4 methods for a
 given S4 class degrades the performance of xtfrm (used in 'order' in new
 R-devel) by a factor of millions. This is for classes that ARE derived
 from numeric directly and thus should be quite trivial to convert to
 numeric.

 Consider the following example:

 setClass(TimeDateBase, 
 representation(numeric, mode=character),
 prototype(mode=posix)
 )
 setClass(TimeDate,
 representation(TimeDateBase, tzone=character),
 prototype(tzone=London)
 )
 x = new(TimeDate, 1220966224 + runif(1e5))

 system.time({ z = order(x) })
 ##  system.time({ z = order(x) })
 ##   user  system elapsed 
 ##  0.048   0.000   0.048 

 getClass(TimeDate)
 ## Class TimeDate

 ## Slots:
 
 ## Name:  .Data tzone  mode
 ## Class:   numeric character character

 ## Extends: 
 ## Class TimeDateBase, directly
 ## Class numeric, by class TimeDateBase, distance 2
 ## Class vector, by class TimeDateBase, distance 3


 Now, if I load a library that not only defines these same classes, but
 also a bunch of methods for those, then I have the following result:

 library(AHLCalendar)
 x = now() + runif(1e5) ## just random times in POSIXct format
 x[1:5]
 ## TimeDate [posix] object in 'Europe/London' of length 5:
 ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672
 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721
 ## [5] 2008-09-09 14:19:35.657

   
 system.time({ z = order(x) })
 


 Enter a frame number, or 0 to exit   

  1: system.time({
  2: order(x)
  3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
  4: FUN(X[[1]], ...)
  5: xtfrm(x)
  6: xtfrm.default(x)
  7: as.vector(rank(x, ties.method = min, na.last = keep))
  8: rank(x, ties.method = min, na.last = keep)
  9: switch(ties.method, average = , min = , max =
 .Internal(rank(x[!nas], ties.
 10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470,
 1220966375.7211
 11: x[j]
 12: x[j]

 Selection: 0
 Timing stopped at: 47.618 13.791 66.478 

 At the same time:

 system.time({ z = as.numeric(x) }) ## same as [EMAIL PROTECTED]
 ##   user  system elapsed 
 ##  0.001   0.000   0.001 

 The only difference between the two is that I have the following methods
 defined for TimeDate (full listing below). 

 Any idea why this could be happenning. And yes, it is down to xtfrm
 function, 'order' was just a place where the problem occured. Should
 xtfrm function be smarter with respect to classes that are actually
 derived from 'numeric'?

   
 showMethods(class=TimeDate)
 
 Function: + (package base)
 e1=TimeDate, e2=TimeDate
 e1=TimeDate, e2=numeric
 (inherited from: e1=TimeDateBase, e2=numeric)

 Function: - (package base)
 e1=TimeDate, e2=TimeDate

 Function: Time (package AHLCalendar)
 x=TimeDate

 Function: TimeDate (package AHLCalendar)
 x=TimeDate

 Function: TimeDate- (package AHLCalendar)
 x=TimeSeries, value=TimeDate

 Function: TimeSeries (package AHLCalendar)
 x=data.frame, ts=TimeDate
 x=matrix, ts=TimeDate
 x=numeric, ts=TimeDate

 Function: [ (package base)
 x=TimeDate, i=POSIXt, j=missing
 x=TimeDate, i=Time, j=missing
 x=TimeDate, i=TimeDate, j=missing
 x=TimeDate, i=integer, j=missing
 (inherited from: x=TimeDateBase, i=ANY, j=missing)
 x=TimeDate, i=logical, j=missing
 (inherited from: x=TimeDateBase, i=ANY, j=missing)
 x=TimeSeries, i=TimeDate, j=missing
 x=TimeSeries, i=TimeDate, j=vector

 Function: [- (package base)
 x=TimeDate, i=ANY, j=ANY, value=ANY
 x=TimeDate, i=ANY, j=ANY, value=numeric
 x=TimeDate, i=missing, j=ANY, value=ANY
 x=TimeDate, i=missing, j=ANY, value=numeric

 Function: add (package AHLCalendar)
 x=TimeDate

 Function: addMonths (package AHLCalendar)
 x=TimeDate

 Function: addYears (package AHLCalendar)
 x=TimeDate

 Function: align (package AHLCalendar)
 x=TimeDate, to=character
 x=TimeDate, to=missing

 Function: as.POSIXct (package base)
 x=TimeDate

 Function: as.POSIXlt (package base)
 x=TimeDate

 Function: coerce (package methods)
 from=TimeDate, to=TimeDateBase

 Function: coerce- (package methods)
 from=TimeDate, to=numeric

 Function: dates (package AHLCalendar)
 

Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread Sklyar, Oleg (London)
Thanks for a quick reply, I was thinking of [ methods myself, but there
are so many of them. I only tested [(x=TimeDate,i=TimeDate,j=missing),
which is a completely non-standard one. It did not seem to have any
effect though. 

I was thinking of writing the 'order' method and will experiment with
getting the one for xtfrm. However, it seems reasonable for the default
xtfrm to check if the object is inherited from a vector and in that case
simply returning the .Data slot? This would solve this and similar cases
immediately:

if (inherits(x,vector)) return(as.vector([EMAIL PROTECTED]))

BTW, generally, xtfrm.default calls 'rank' and it is not clear why rank
should work for a generic S4 object... this is essentially where the
problem is.


On a side note, a week ago I submitted a patch for the plot.default to
Rd, but nobody reacted (I checked the most recent patched and devel as
well) -- it is really an ugly bug (e.g
plot(1:5,1:5,xlim=c(-10,10),ylim=c(-8,3))  ) and the trivial patch fixes
it. Would be grateful if somebody from R-core checks it up. Meanwhile I
patch the graphics library before compiling R, which is not the best
solution. Here is the patch for src/library/graphics/plot.R

70,71c70,71
   localAxis(if(is.null(y)) xy$x else x, side = 1, ...)
   localAxis(if(is.null(y))  x   else y, side = 2, ...)
---
 localAxis(xlim, side = 1, ...)
 localAxis(ylim, side = 2, ...)


Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
[EMAIL PROTECTED] 

 -Original Message-
 From: John Chambers [mailto:[EMAIL PROTECTED] 
 Sent: 09 September 2008 15:11
 To: Sklyar, Oleg (London)
 Cc: R-devel@r-project.org
 Subject: Re: [Rd] 'xtfrm' performance (influences 'order' 
 performance) in R devel
 
 No definitive answers, but here are a few observations.
 
 In the call to order() code, I notice that you have dropped 
 into the branch
 if (any(unlist(lapply(z, is.object
 where the alternative in your case would seem to have been 
 going directly to the internal code.
 
 You can consider a method for xtfrm(), which would help but 
 won't get you completely back to a trivial computation.  
 Alternatively,  order() should be eligible for the new 
 mechanism of defining methods for 
 
 (Individual existing methods may not be the issue, and one 
 can't infer anything definite from the evidence given,  but a 
 plausible culprit is the [ method.  Because [] expressions 
 appear so often, it's always chancy to define a nontrivial 
 method for this function.)
 
 John
 
 Sklyar, Oleg (London) wrote: 
 
   Hello everybody,
   
   it looks like the presense of some (do know know which) 
 S4 methods for a
   given S4 class degrades the performance of xtfrm (used 
 in 'order' in new
   R-devel) by a factor of millions. This is for classes 
 that ARE derived
   from numeric directly and thus should be quite trivial 
 to convert to
   numeric.
   
   Consider the following example:
   
   setClass(TimeDateBase, 
   representation(numeric, mode=character),
   prototype(mode=posix)
   )
   setClass(TimeDate,
   representation(TimeDateBase, tzone=character),
   prototype(tzone=London)
   )
   x = new(TimeDate, 1220966224 + runif(1e5))
   
   system.time({ z = order(x) })
   ##  system.time({ z = order(x) })
   ##   user  system elapsed 
   ##  0.048   0.000   0.048 
   
   getClass(TimeDate)
   ## Class TimeDate
   
   ## Slots:
   
   ## Name:  .Data tzone  mode
   ## Class:   numeric character character
   
   ## Extends: 
   ## Class TimeDateBase, directly
   ## Class numeric, by class TimeDateBase, distance 2
   ## Class vector, by class TimeDateBase, distance 3
   
   
   Now, if I load a library that not only defines these 
 same classes, but
   also a bunch of methods for those, then I have the 
 following result:
   
   library(AHLCalendar)
   x = now() + runif(1e5) ## just random times in POSIXct format
   x[1:5]
   ## TimeDate [posix] object in 'Europe/London' of length 5:
   ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672
   ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721
   ## [5] 2008-09-09 14:19:35.657
   
 
 
   system.time({ z = order(x) })
   
 
   
   
   Enter a frame number, or 0 to exit   
   
1: system.time({
2: order(x)
3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
4: FUN(X[[1]], ...)
5: xtfrm(x)
6: xtfrm.default(x)
7: as.vector(rank(x, ties.method = min, na.last = keep))
8: rank(x, ties.method = min, na.last = keep)
9: switch(ties.method, average = , min = , max =
   .Internal(rank(x[!nas], ties.
   10: .gt(c(1220966375.21811

Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread Sklyar, Oleg (London)
Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a
method). However, it won't be taken up by order as it is not in the
imported namespace, so order falls back to xtfrm.default.

Moreover, defining order (which is not a method unfortunately, *any
chance of changing this*?):

setGeneric(order)
setMethod(order, TimeDate, 
function (..., na.last = TRUE, decreasing = FALSE) 
order(list(...)[EMAIL PROTECTED],na.last=na.last,
decreasing=decreasing))

does not help either as it won't be taken up, order still calls the
default one, what am I doing wrong?



Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
[EMAIL PROTECTED] 

 -Original Message-
 From: John Chambers [mailto:[EMAIL PROTECTED] 
 Sent: 09 September 2008 15:11
 To: Sklyar, Oleg (London)
 Cc: R-devel@r-project.org
 Subject: Re: [Rd] 'xtfrm' performance (influences 'order' 
 performance) in R devel
 
 No definitive answers, but here are a few observations.
 
 In the call to order() code, I notice that you have dropped 
 into the branch
 if (any(unlist(lapply(z, is.object
 where the alternative in your case would seem to have been 
 going directly to the internal code.
 
 You can consider a method for xtfrm(), which would help but 
 won't get you completely back to a trivial computation.  
 Alternatively,  order() should be eligible for the new 
 mechanism of defining methods for 
 
 (Individual existing methods may not be the issue, and one 
 can't infer anything definite from the evidence given,  but a 
 plausible culprit is the [ method.  Because [] expressions 
 appear so often, it's always chancy to define a nontrivial 
 method for this function.)
 
 John
 
 Sklyar, Oleg (London) wrote: 
 
   Hello everybody,
   
   it looks like the presense of some (do know know which) 
 S4 methods for a
   given S4 class degrades the performance of xtfrm (used 
 in 'order' in new
   R-devel) by a factor of millions. This is for classes 
 that ARE derived
   from numeric directly and thus should be quite trivial 
 to convert to
   numeric.
   
   Consider the following example:
   
   setClass(TimeDateBase, 
   representation(numeric, mode=character),
   prototype(mode=posix)
   )
   setClass(TimeDate,
   representation(TimeDateBase, tzone=character),
   prototype(tzone=London)
   )
   x = new(TimeDate, 1220966224 + runif(1e5))
   
   system.time({ z = order(x) })
   ##  system.time({ z = order(x) })
   ##   user  system elapsed 
   ##  0.048   0.000   0.048 
   
   getClass(TimeDate)
   ## Class TimeDate
   
   ## Slots:
   
   ## Name:  .Data tzone  mode
   ## Class:   numeric character character
   
   ## Extends: 
   ## Class TimeDateBase, directly
   ## Class numeric, by class TimeDateBase, distance 2
   ## Class vector, by class TimeDateBase, distance 3
   
   
   Now, if I load a library that not only defines these 
 same classes, but
   also a bunch of methods for those, then I have the 
 following result:
   
   library(AHLCalendar)
   x = now() + runif(1e5) ## just random times in POSIXct format
   x[1:5]
   ## TimeDate [posix] object in 'Europe/London' of length 5:
   ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672
   ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721
   ## [5] 2008-09-09 14:19:35.657
   
 
 
   system.time({ z = order(x) })
   
 
   
   
   Enter a frame number, or 0 to exit   
   
1: system.time({
2: order(x)
3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
4: FUN(X[[1]], ...)
5: xtfrm(x)
6: xtfrm.default(x)
7: as.vector(rank(x, ties.method = min, na.last = keep))
8: rank(x, ties.method = min, na.last = keep)
9: switch(ties.method, average = , min = , max =
   .Internal(rank(x[!nas], ties.
   10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470,
   1220966375.7211
   11: x[j]
   12: x[j]
   
   Selection: 0
   Timing stopped at: 47.618 13.791 66.478 
   
   At the same time:
   
   system.time({ z = as.numeric(x) }) ## same as [EMAIL PROTECTED]
   ##   user  system elapsed 
   ##  0.001   0.000   0.001 
   
   The only difference between the two is that I have the 
 following methods
   defined for TimeDate (full listing below). 
   
   Any idea why this could be happenning. And yes, it is 
 down to xtfrm
   function, 'order' was just a place where the problem 
 occured. Should
   xtfrm function be smarter with respect to classes that 
 are actually
   derived from 'numeric'?
   
 
 
   showMethods(class=TimeDate

Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread Sklyar, Oleg (London)
In fact it all comes back to 'rank', which uses 'order(x[!nas])'
internally. Surprisingly one does not get an infinite recursion: rank -
order - xtfrm - rank - ...

This is obviously only one of possible outcomes, yet it seems to be
happening. Previous implementation of order did not have a reference to
xtfrm and thus would not cause this infinite loop

Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
[EMAIL PROTECTED] 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Sklyar, 
 Oleg (London)
 Sent: 09 September 2008 15:49
 To: John Chambers
 Cc: R-devel@r-project.org
 Subject: Re: [Rd] 'xtfrm' performance (influences 'order' 
 performance) in R devel
 
 Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a
 method). However, it won't be taken up by order as it is not in the
 imported namespace, so order falls back to xtfrm.default.
 
 Moreover, defining order (which is not a method unfortunately, *any
 chance of changing this*?):
 
 setGeneric(order)
 setMethod(order, TimeDate, 
   function (..., na.last = TRUE, decreasing = FALSE) 
   order(list(...)[EMAIL PROTECTED],na.last=na.last,
 decreasing=decreasing))
 
 does not help either as it won't be taken up, order still calls the
 default one, what am I doing wrong?
 
 
 
 Dr Oleg Sklyar
 Research Technologist
 AHL / Man Investments Ltd
 +44 (0)20 7144 3107
 [EMAIL PROTECTED] 
 
  -Original Message-
  From: John Chambers [mailto:[EMAIL PROTECTED] 
  Sent: 09 September 2008 15:11
  To: Sklyar, Oleg (London)
  Cc: R-devel@r-project.org
  Subject: Re: [Rd] 'xtfrm' performance (influences 'order' 
  performance) in R devel
  
  No definitive answers, but here are a few observations.
  
  In the call to order() code, I notice that you have dropped 
  into the branch
  if (any(unlist(lapply(z, is.object
  where the alternative in your case would seem to have been 
  going directly to the internal code.
  
  You can consider a method for xtfrm(), which would help but 
  won't get you completely back to a trivial computation.  
  Alternatively,  order() should be eligible for the new 
  mechanism of defining methods for 
  
  (Individual existing methods may not be the issue, and one 
  can't infer anything definite from the evidence given,  but a 
  plausible culprit is the [ method.  Because [] expressions 
  appear so often, it's always chancy to define a nontrivial 
  method for this function.)
  
  John
  
  Sklyar, Oleg (London) wrote: 
  
  Hello everybody,
  
  it looks like the presense of some (do know know which) 
  S4 methods for a
  given S4 class degrades the performance of xtfrm (used 
  in 'order' in new
  R-devel) by a factor of millions. This is for classes 
  that ARE derived
  from numeric directly and thus should be quite trivial 
  to convert to
  numeric.
  
  Consider the following example:
  
  setClass(TimeDateBase, 
  representation(numeric, mode=character),
  prototype(mode=posix)
  )
  setClass(TimeDate,
  representation(TimeDateBase, tzone=character),
  prototype(tzone=London)
  )
  x = new(TimeDate, 1220966224 + runif(1e5))
  
  system.time({ z = order(x) })
  ##  system.time({ z = order(x) })
  ##   user  system elapsed 
  ##  0.048   0.000   0.048 
  
  getClass(TimeDate)
  ## Class TimeDate
  
  ## Slots:
  
  ## Name:  .Data tzone  mode
  ## Class:   numeric character character
  
  ## Extends: 
  ## Class TimeDateBase, directly
  ## Class numeric, by class TimeDateBase, distance 2
  ## Class vector, by class TimeDateBase, distance 3
  
  
  Now, if I load a library that not only defines these 
  same classes, but
  also a bunch of methods for those, then I have the 
  following result:
  
  library(AHLCalendar)
  x = now() + runif(1e5) ## just random times in POSIXct format
  x[1:5]
  ## TimeDate [posix] object in 'Europe/London' of length 5:
  ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672
  ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721
  ## [5] 2008-09-09 14:19:35.657
  

  
  system.time({ z = order(x) })
  
  
  
  
  Enter a frame number, or 0 to exit   
  
   1: system.time({
   2: order(x)
   3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
   4: FUN(X[[1]], ...)
   5: xtfrm(x)
   6: xtfrm.default(x)
   7: as.vector(rank(x, ties.method = min, na.last = keep))
   8: rank(x, ties.method = min, na.last = keep)
   9: switch(ties.method, average = , min = , max =
  .Internal(rank(x[!nas], ties.
  10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470,
  1220966375.7211
  11: x[j]
  12: x[j

Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread John Chambers
Sklyar, Oleg (London) wrote:
 Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a
 method). However, it won't be taken up by order as it is not in the
 imported namespace, so order falls back to xtfrm.default.
   
By method you mean generic?  xtfrm is an S3 generic.  I'm not clear 
what happens if you define an S3 method for it.  Yes, there is a problem 
defining an S4 generic  it would be good to deal with that.  
Nontrivial, however.
 Moreover, defining order (which is not a method unfortunately, *any
 chance of changing this*?):

 setGeneric(order)
 setMethod(order, TimeDate, 
   function (..., na.last = TRUE, decreasing = FALSE) 
   order(list(...)[EMAIL PROTECTED],na.last=na.last,
 decreasing=decreasing))

 does not help either as it won't be taken up, order still calls the
 default one, what am I doing wrong?
   
I'm skeptical that this is true.  I did a simple example:

  setClass(foo, contains = numeric, representation(flag = logical))
[1] foo
  xx = new(foo, rnorm(5))
  setGeneric(order, sig = ...)
Creating a generic for order in package  .GlobalEnv
(the supplied definition differs from and overrides the implicit 
generic in package base: Signatures differ:  (...), (na.last, decreasing))
[1] order
  setMethod(order, foo, function (..., na.last = TRUE, decreasing = 
FALSE){message(Method called); order([EMAIL PROTECTED])})
[1] order
  order(xx)
Method called
[1] 2 4 3 1 5

You do need to be calling order() directly from one of your functions, 
and have it in your namespace, if your package has one.


 Dr Oleg Sklyar
 Research Technologist
 AHL / Man Investments Ltd
 +44 (0)20 7144 3107
 [EMAIL PROTECTED] 

   
 -Original Message-
 From: John Chambers [mailto:[EMAIL PROTECTED] 
 Sent: 09 September 2008 15:11
 To: Sklyar, Oleg (London)
 Cc: R-devel@r-project.org
 Subject: Re: [Rd] 'xtfrm' performance (influences 'order' 
 performance) in R devel

 No definitive answers, but here are a few observations.

 In the call to order() code, I notice that you have dropped 
 into the branch
 if (any(unlist(lapply(z, is.object
 where the alternative in your case would seem to have been 
 going directly to the internal code.

 You can consider a method for xtfrm(), which would help but 
 won't get you completely back to a trivial computation.  
 Alternatively,  order() should be eligible for the new 
 mechanism of defining methods for 

 (Individual existing methods may not be the issue, and one 
 can't infer anything definite from the evidence given,  but a 
 plausible culprit is the [ method.  Because [] expressions 
 appear so often, it's always chancy to define a nontrivial 
 method for this function.)

 John

 Sklyar, Oleg (London) wrote: 

  Hello everybody,
  
  it looks like the presense of some (do know know which) 
 S4 methods for a
  given S4 class degrades the performance of xtfrm (used 
 in 'order' in new
  R-devel) by a factor of millions. This is for classes 
 that ARE derived
  from numeric directly and thus should be quite trivial 
 to convert to
  numeric.
  
  Consider the following example:
  
  setClass(TimeDateBase, 
  representation(numeric, mode=character),
  prototype(mode=posix)
  )
  setClass(TimeDate,
  representation(TimeDateBase, tzone=character),
  prototype(tzone=London)
  )
  x = new(TimeDate, 1220966224 + runif(1e5))
  
  system.time({ z = order(x) })
  ##  system.time({ z = order(x) })
  ##   user  system elapsed 
  ##  0.048   0.000   0.048 
  
  getClass(TimeDate)
  ## Class TimeDate
  
  ## Slots:
  
  ## Name:  .Data tzone  mode
  ## Class:   numeric character character
  
  ## Extends: 
  ## Class TimeDateBase, directly
  ## Class numeric, by class TimeDateBase, distance 2
  ## Class vector, by class TimeDateBase, distance 3
  
  
  Now, if I load a library that not only defines these 
 same classes, but
  also a bunch of methods for those, then I have the 
 following result:
  
  library(AHLCalendar)
  x = now() + runif(1e5) ## just random times in POSIXct format
  x[1:5]
  ## TimeDate [posix] object in 'Europe/London' of length 5:
  ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672
  ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721
  ## [5] 2008-09-09 14:19:35.657
  


  system.time({ z = order(x) })
  

  
  
  Enter a frame number, or 0 to exit   
  
   1: system.time({
   2: order(x)
   3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x)
   4: FUN(X[[1]], ...)
   5: xtfrm(x)
   6: xtfrm.default(x)
   7: as.vector(rank(x, ties.method = min, na.last = keep))
   8: rank(x, ties.method = min, na.last = keep)
   9: switch(ties.method, average = , min

Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel

2008-09-09 Thread Sklyar, Oleg (London)

Aha, it works if I do

setGeneric(order, signature=...)

However the problem with that is that it generates a warning which I
cannot suppress on install:

Creating a generic for order in package  AHLCalendar
(the supplied definition differs from and overrides the implicit
generic in package base: Signatures differ:  (...), (na.last,
decreasing))

and it generates a warning about masking order from base on load:

  AHLCalendar [0.2.42] (9 Sep 2008). ?AHLCalendar or
vignette('AHLCalendar') to get started

Attaching package: 'AHLCalendar'

The following object(s) are masked from package:base :

 order 

The package exports (excerpt):

exportPattern(^[^\\.])
exportMethods(order)

The reason for these messages is that the signature is different and I
particularly dislike the masking thing (as I cannot predict if it leads
to other problems somewhere). As I understand the current dotsMethods
does not allow mixing dots and other types, so I cannot really define a
matching signature. Is that right? Is there a way around it?

As for the rest, yes, I meant generic and it works nicely for xtfrm. But
as I wrote later, the problem is in 'rank' and rank is not generic so
defining a method would not help in calling a different implementation.

Thanks,
Oleg

Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
[EMAIL PROTECTED] 

 -Original Message-
 From: John Chambers [mailto:[EMAIL PROTECTED] 
 Sent: 09 September 2008 16:42
 To: Sklyar, Oleg (London)
 Cc: R-devel@r-project.org
 Subject: Re: [Rd] 'xtfrm' performance (influences 'order' 
 performance) in R devel
 
 Sklyar, Oleg (London) wrote: 
 
   Ha, defined xtfrm for TimeDate, works instantly (xtfrm 
 is already a
   method). However, it won't be taken up by order as it 
 is not in the
   imported namespace, so order falls back to xtfrm.default.
 
 
 By method you mean generic?  xtfrm is an S3 generic.  I'm 
 not clear what happens if you define an S3 method for it.  
 Yes, there is a problem defining an S4 generic  it would be 
 good to deal with that.  Nontrivial, however.
 
 
   
   Moreover, defining order (which is not a method 
 unfortunately, *any
   chance of changing this*?):
   
   setGeneric(order)
   setMethod(order, TimeDate, 
   function (..., na.last = TRUE, decreasing = FALSE) 
   order(list(...)[EMAIL PROTECTED],na.last=na.last,
   decreasing=decreasing))
   
   does not help either as it won't be taken up, order 
 still calls the
   default one, what am I doing wrong?
 
 
 I'm skeptical that this is true.  I did a simple example:
 
  setClass(foo, contains = numeric, representation(flag = 
 logical))
 [1] foo
  xx = new(foo, rnorm(5))
  setGeneric(order, sig = ...)
 Creating a generic for order in package  .GlobalEnv
 (the supplied definition differs from and overrides the 
 implicit generic in package base: Signatures differ:  
 (...), (na.last, decreasing))
 [1] order
  setMethod(order, foo, function (..., na.last = TRUE, 
 decreasing = FALSE){message(Method called); order([EMAIL PROTECTED])})
 [1] order
  order(xx)
 Method called
 [1] 2 4 3 1 5
 
 You do need to be calling order() directly from one of your 
 functions, and have it in your namespace, if your package has one.
 
 
   
   
   
   Dr Oleg Sklyar
   Research Technologist
   AHL / Man Investments Ltd
   +44 (0)20 7144 3107
   [EMAIL PROTECTED] 
   
 
 
   -Original Message-
   From: John Chambers [mailto:[EMAIL PROTECTED] 
   Sent: 09 September 2008 15:11
   To: Sklyar, Oleg (London)
   Cc: R-devel@r-project.org
   Subject: Re: [Rd] 'xtfrm' performance 
 (influences 'order' 
   performance) in R devel
   
   No definitive answers, but here are a few observations.
   
   In the call to order() code, I notice that you 
 have dropped 
   into the branch
   if (any(unlist(lapply(z, is.object
   where the alternative in your case would seem 
 to have been 
   going directly to the internal code.
   
   You can consider a method for xtfrm(), which 
 would help but 
   won't get you completely back to a trivial 
 computation.  
   Alternatively,  order() should be eligible for the new 
   mechanism of defining methods for 
   
   (Individual existing methods may not be the 
 issue, and one 
   can't infer anything definite from the evidence 
 given,  but a 
   plausible culprit is the [ method.  Because 
 [] expressions 
   appear so often, it's always chancy to define a 
 nontrivial 
   method for this function.)
   
   John