[Rd] 'xtfrm' performance (influences 'order' performance) in R devel
Hello everybody, it looks like the presense of some (do know know which) S4 methods for a given S4 class degrades the performance of xtfrm (used in 'order' in new R-devel) by a factor of millions. This is for classes that ARE derived from numeric directly and thus should be quite trivial to convert to numeric. Consider the following example: setClass(TimeDateBase, representation(numeric, mode=character), prototype(mode=posix) ) setClass(TimeDate, representation(TimeDateBase, tzone=character), prototype(tzone=London) ) x = new(TimeDate, 1220966224 + runif(1e5)) system.time({ z = order(x) }) ## system.time({ z = order(x) }) ## user system elapsed ## 0.048 0.000 0.048 getClass(TimeDate) ## Class TimeDate ## Slots: ## Name: .Data tzone mode ## Class: numeric character character ## Extends: ## Class TimeDateBase, directly ## Class numeric, by class TimeDateBase, distance 2 ## Class vector, by class TimeDateBase, distance 3 Now, if I load a library that not only defines these same classes, but also a bunch of methods for those, then I have the following result: library(AHLCalendar) x = now() + runif(1e5) ## just random times in POSIXct format x[1:5] ## TimeDate [posix] object in 'Europe/London' of length 5: ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721 ## [5] 2008-09-09 14:19:35.657 system.time({ z = order(x) }) Enter a frame number, or 0 to exit 1: system.time({ 2: order(x) 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x) 4: FUN(X[[1]], ...) 5: xtfrm(x) 6: xtfrm.default(x) 7: as.vector(rank(x, ties.method = min, na.last = keep)) 8: rank(x, ties.method = min, na.last = keep) 9: switch(ties.method, average = , min = , max = .Internal(rank(x[!nas], ties. 10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470, 1220966375.7211 11: x[j] 12: x[j] Selection: 0 Timing stopped at: 47.618 13.791 66.478 At the same time: system.time({ z = as.numeric(x) }) ## same as [EMAIL PROTECTED] ## user system elapsed ## 0.001 0.000 0.001 The only difference between the two is that I have the following methods defined for TimeDate (full listing below). Any idea why this could be happenning. And yes, it is down to xtfrm function, 'order' was just a place where the problem occured. Should xtfrm function be smarter with respect to classes that are actually derived from 'numeric'? showMethods(class=TimeDate) Function: + (package base) e1=TimeDate, e2=TimeDate e1=TimeDate, e2=numeric (inherited from: e1=TimeDateBase, e2=numeric) Function: - (package base) e1=TimeDate, e2=TimeDate Function: Time (package AHLCalendar) x=TimeDate Function: TimeDate (package AHLCalendar) x=TimeDate Function: TimeDate- (package AHLCalendar) x=TimeSeries, value=TimeDate Function: TimeSeries (package AHLCalendar) x=data.frame, ts=TimeDate x=matrix, ts=TimeDate x=numeric, ts=TimeDate Function: [ (package base) x=TimeDate, i=POSIXt, j=missing x=TimeDate, i=Time, j=missing x=TimeDate, i=TimeDate, j=missing x=TimeDate, i=integer, j=missing (inherited from: x=TimeDateBase, i=ANY, j=missing) x=TimeDate, i=logical, j=missing (inherited from: x=TimeDateBase, i=ANY, j=missing) x=TimeSeries, i=TimeDate, j=missing x=TimeSeries, i=TimeDate, j=vector Function: [- (package base) x=TimeDate, i=ANY, j=ANY, value=ANY x=TimeDate, i=ANY, j=ANY, value=numeric x=TimeDate, i=missing, j=ANY, value=ANY x=TimeDate, i=missing, j=ANY, value=numeric Function: add (package AHLCalendar) x=TimeDate Function: addMonths (package AHLCalendar) x=TimeDate Function: addYears (package AHLCalendar) x=TimeDate Function: align (package AHLCalendar) x=TimeDate, to=character x=TimeDate, to=missing Function: as.POSIXct (package base) x=TimeDate Function: as.POSIXlt (package base) x=TimeDate Function: coerce (package methods) from=TimeDate, to=TimeDateBase Function: coerce- (package methods) from=TimeDate, to=numeric Function: dates (package AHLCalendar) x=TimeDate Function: format (package base) x=TimeDate Function: fxFwdDate (package AHLCalendar) x=TimeDate, country=character Function: fxSettleDate (package AHLCalendar) x=TimeDate, country=character Function: holidays (package AHLCalendar) x=TimeDate Function: index (package AHLCalendar) x=TimeDate, y=POSIXt x=TimeDate, y=Time x=TimeDate, y=TimeDate Function: initialize (package methods) .Object=TimeDate (inherited from: .Object=ANY) Function: leapYear (package AHLCalendar) x=TimeDate Function: mday (package AHLCalendar) x=TimeDate Function: mode (package base) x=TimeDate (inherited from: x=TimeDateBase) Function: mode- (package base) x=TimeDate, value=character (inherited from: x=TimeDateBase, value=character) Function: month (package AHLCalendar) x=TimeDate Function: pretty (package base) x=TimeDate Function: prettyFormat (package AHLCalendar) x=TimeDate, munit=character x=TimeDate,
Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel
No definitive answers, but here are a few observations. In the call to order() code, I notice that you have dropped into the branch if (any(unlist(lapply(z, is.object where the alternative in your case would seem to have been going directly to the internal code. You can consider a method for xtfrm(), which would help but won't get you completely back to a trivial computation. Alternatively, order() should be eligible for the new mechanism of defining methods for (Individual existing methods may not be the issue, and one can't infer anything definite from the evidence given, but a plausible culprit is the [ method. Because [] expressions appear so often, it's always chancy to define a nontrivial method for this function.) John Sklyar, Oleg (London) wrote: Hello everybody, it looks like the presense of some (do know know which) S4 methods for a given S4 class degrades the performance of xtfrm (used in 'order' in new R-devel) by a factor of millions. This is for classes that ARE derived from numeric directly and thus should be quite trivial to convert to numeric. Consider the following example: setClass(TimeDateBase, representation(numeric, mode=character), prototype(mode=posix) ) setClass(TimeDate, representation(TimeDateBase, tzone=character), prototype(tzone=London) ) x = new(TimeDate, 1220966224 + runif(1e5)) system.time({ z = order(x) }) ## system.time({ z = order(x) }) ## user system elapsed ## 0.048 0.000 0.048 getClass(TimeDate) ## Class TimeDate ## Slots: ## Name: .Data tzone mode ## Class: numeric character character ## Extends: ## Class TimeDateBase, directly ## Class numeric, by class TimeDateBase, distance 2 ## Class vector, by class TimeDateBase, distance 3 Now, if I load a library that not only defines these same classes, but also a bunch of methods for those, then I have the following result: library(AHLCalendar) x = now() + runif(1e5) ## just random times in POSIXct format x[1:5] ## TimeDate [posix] object in 'Europe/London' of length 5: ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721 ## [5] 2008-09-09 14:19:35.657 system.time({ z = order(x) }) Enter a frame number, or 0 to exit 1: system.time({ 2: order(x) 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x) 4: FUN(X[[1]], ...) 5: xtfrm(x) 6: xtfrm.default(x) 7: as.vector(rank(x, ties.method = min, na.last = keep)) 8: rank(x, ties.method = min, na.last = keep) 9: switch(ties.method, average = , min = , max = .Internal(rank(x[!nas], ties. 10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470, 1220966375.7211 11: x[j] 12: x[j] Selection: 0 Timing stopped at: 47.618 13.791 66.478 At the same time: system.time({ z = as.numeric(x) }) ## same as [EMAIL PROTECTED] ## user system elapsed ## 0.001 0.000 0.001 The only difference between the two is that I have the following methods defined for TimeDate (full listing below). Any idea why this could be happenning. And yes, it is down to xtfrm function, 'order' was just a place where the problem occured. Should xtfrm function be smarter with respect to classes that are actually derived from 'numeric'? showMethods(class=TimeDate) Function: + (package base) e1=TimeDate, e2=TimeDate e1=TimeDate, e2=numeric (inherited from: e1=TimeDateBase, e2=numeric) Function: - (package base) e1=TimeDate, e2=TimeDate Function: Time (package AHLCalendar) x=TimeDate Function: TimeDate (package AHLCalendar) x=TimeDate Function: TimeDate- (package AHLCalendar) x=TimeSeries, value=TimeDate Function: TimeSeries (package AHLCalendar) x=data.frame, ts=TimeDate x=matrix, ts=TimeDate x=numeric, ts=TimeDate Function: [ (package base) x=TimeDate, i=POSIXt, j=missing x=TimeDate, i=Time, j=missing x=TimeDate, i=TimeDate, j=missing x=TimeDate, i=integer, j=missing (inherited from: x=TimeDateBase, i=ANY, j=missing) x=TimeDate, i=logical, j=missing (inherited from: x=TimeDateBase, i=ANY, j=missing) x=TimeSeries, i=TimeDate, j=missing x=TimeSeries, i=TimeDate, j=vector Function: [- (package base) x=TimeDate, i=ANY, j=ANY, value=ANY x=TimeDate, i=ANY, j=ANY, value=numeric x=TimeDate, i=missing, j=ANY, value=ANY x=TimeDate, i=missing, j=ANY, value=numeric Function: add (package AHLCalendar) x=TimeDate Function: addMonths (package AHLCalendar) x=TimeDate Function: addYears (package AHLCalendar) x=TimeDate Function: align (package AHLCalendar) x=TimeDate, to=character x=TimeDate, to=missing Function: as.POSIXct (package base) x=TimeDate Function: as.POSIXlt (package base) x=TimeDate Function: coerce (package methods) from=TimeDate, to=TimeDateBase Function: coerce- (package methods) from=TimeDate, to=numeric Function: dates (package AHLCalendar)
Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel
Thanks for a quick reply, I was thinking of [ methods myself, but there are so many of them. I only tested [(x=TimeDate,i=TimeDate,j=missing), which is a completely non-standard one. It did not seem to have any effect though. I was thinking of writing the 'order' method and will experiment with getting the one for xtfrm. However, it seems reasonable for the default xtfrm to check if the object is inherited from a vector and in that case simply returning the .Data slot? This would solve this and similar cases immediately: if (inherits(x,vector)) return(as.vector([EMAIL PROTECTED])) BTW, generally, xtfrm.default calls 'rank' and it is not clear why rank should work for a generic S4 object... this is essentially where the problem is. On a side note, a week ago I submitted a patch for the plot.default to Rd, but nobody reacted (I checked the most recent patched and devel as well) -- it is really an ugly bug (e.g plot(1:5,1:5,xlim=c(-10,10),ylim=c(-8,3)) ) and the trivial patch fixes it. Would be grateful if somebody from R-core checks it up. Meanwhile I patch the graphics library before compiling R, which is not the best solution. Here is the patch for src/library/graphics/plot.R 70,71c70,71 localAxis(if(is.null(y)) xy$x else x, side = 1, ...) localAxis(if(is.null(y)) x else y, side = 2, ...) --- localAxis(xlim, side = 1, ...) localAxis(ylim, side = 2, ...) Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: John Chambers [mailto:[EMAIL PROTECTED] Sent: 09 September 2008 15:11 To: Sklyar, Oleg (London) Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel No definitive answers, but here are a few observations. In the call to order() code, I notice that you have dropped into the branch if (any(unlist(lapply(z, is.object where the alternative in your case would seem to have been going directly to the internal code. You can consider a method for xtfrm(), which would help but won't get you completely back to a trivial computation. Alternatively, order() should be eligible for the new mechanism of defining methods for (Individual existing methods may not be the issue, and one can't infer anything definite from the evidence given, but a plausible culprit is the [ method. Because [] expressions appear so often, it's always chancy to define a nontrivial method for this function.) John Sklyar, Oleg (London) wrote: Hello everybody, it looks like the presense of some (do know know which) S4 methods for a given S4 class degrades the performance of xtfrm (used in 'order' in new R-devel) by a factor of millions. This is for classes that ARE derived from numeric directly and thus should be quite trivial to convert to numeric. Consider the following example: setClass(TimeDateBase, representation(numeric, mode=character), prototype(mode=posix) ) setClass(TimeDate, representation(TimeDateBase, tzone=character), prototype(tzone=London) ) x = new(TimeDate, 1220966224 + runif(1e5)) system.time({ z = order(x) }) ## system.time({ z = order(x) }) ## user system elapsed ## 0.048 0.000 0.048 getClass(TimeDate) ## Class TimeDate ## Slots: ## Name: .Data tzone mode ## Class: numeric character character ## Extends: ## Class TimeDateBase, directly ## Class numeric, by class TimeDateBase, distance 2 ## Class vector, by class TimeDateBase, distance 3 Now, if I load a library that not only defines these same classes, but also a bunch of methods for those, then I have the following result: library(AHLCalendar) x = now() + runif(1e5) ## just random times in POSIXct format x[1:5] ## TimeDate [posix] object in 'Europe/London' of length 5: ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721 ## [5] 2008-09-09 14:19:35.657 system.time({ z = order(x) }) Enter a frame number, or 0 to exit 1: system.time({ 2: order(x) 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x) 4: FUN(X[[1]], ...) 5: xtfrm(x) 6: xtfrm.default(x) 7: as.vector(rank(x, ties.method = min, na.last = keep)) 8: rank(x, ties.method = min, na.last = keep) 9: switch(ties.method, average = , min = , max = .Internal(rank(x[!nas], ties. 10: .gt(c(1220966375.21811
Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel
Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a method). However, it won't be taken up by order as it is not in the imported namespace, so order falls back to xtfrm.default. Moreover, defining order (which is not a method unfortunately, *any chance of changing this*?): setGeneric(order) setMethod(order, TimeDate, function (..., na.last = TRUE, decreasing = FALSE) order(list(...)[EMAIL PROTECTED],na.last=na.last, decreasing=decreasing)) does not help either as it won't be taken up, order still calls the default one, what am I doing wrong? Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: John Chambers [mailto:[EMAIL PROTECTED] Sent: 09 September 2008 15:11 To: Sklyar, Oleg (London) Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel No definitive answers, but here are a few observations. In the call to order() code, I notice that you have dropped into the branch if (any(unlist(lapply(z, is.object where the alternative in your case would seem to have been going directly to the internal code. You can consider a method for xtfrm(), which would help but won't get you completely back to a trivial computation. Alternatively, order() should be eligible for the new mechanism of defining methods for (Individual existing methods may not be the issue, and one can't infer anything definite from the evidence given, but a plausible culprit is the [ method. Because [] expressions appear so often, it's always chancy to define a nontrivial method for this function.) John Sklyar, Oleg (London) wrote: Hello everybody, it looks like the presense of some (do know know which) S4 methods for a given S4 class degrades the performance of xtfrm (used in 'order' in new R-devel) by a factor of millions. This is for classes that ARE derived from numeric directly and thus should be quite trivial to convert to numeric. Consider the following example: setClass(TimeDateBase, representation(numeric, mode=character), prototype(mode=posix) ) setClass(TimeDate, representation(TimeDateBase, tzone=character), prototype(tzone=London) ) x = new(TimeDate, 1220966224 + runif(1e5)) system.time({ z = order(x) }) ## system.time({ z = order(x) }) ## user system elapsed ## 0.048 0.000 0.048 getClass(TimeDate) ## Class TimeDate ## Slots: ## Name: .Data tzone mode ## Class: numeric character character ## Extends: ## Class TimeDateBase, directly ## Class numeric, by class TimeDateBase, distance 2 ## Class vector, by class TimeDateBase, distance 3 Now, if I load a library that not only defines these same classes, but also a bunch of methods for those, then I have the following result: library(AHLCalendar) x = now() + runif(1e5) ## just random times in POSIXct format x[1:5] ## TimeDate [posix] object in 'Europe/London' of length 5: ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721 ## [5] 2008-09-09 14:19:35.657 system.time({ z = order(x) }) Enter a frame number, or 0 to exit 1: system.time({ 2: order(x) 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x) 4: FUN(X[[1]], ...) 5: xtfrm(x) 6: xtfrm.default(x) 7: as.vector(rank(x, ties.method = min, na.last = keep)) 8: rank(x, ties.method = min, na.last = keep) 9: switch(ties.method, average = , min = , max = .Internal(rank(x[!nas], ties. 10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470, 1220966375.7211 11: x[j] 12: x[j] Selection: 0 Timing stopped at: 47.618 13.791 66.478 At the same time: system.time({ z = as.numeric(x) }) ## same as [EMAIL PROTECTED] ## user system elapsed ## 0.001 0.000 0.001 The only difference between the two is that I have the following methods defined for TimeDate (full listing below). Any idea why this could be happenning. And yes, it is down to xtfrm function, 'order' was just a place where the problem occured. Should xtfrm function be smarter with respect to classes that are actually derived from 'numeric'? showMethods(class=TimeDate
Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel
In fact it all comes back to 'rank', which uses 'order(x[!nas])' internally. Surprisingly one does not get an infinite recursion: rank - order - xtfrm - rank - ... This is obviously only one of possible outcomes, yet it seems to be happening. Previous implementation of order did not have a reference to xtfrm and thus would not cause this infinite loop Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sklyar, Oleg (London) Sent: 09 September 2008 15:49 To: John Chambers Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a method). However, it won't be taken up by order as it is not in the imported namespace, so order falls back to xtfrm.default. Moreover, defining order (which is not a method unfortunately, *any chance of changing this*?): setGeneric(order) setMethod(order, TimeDate, function (..., na.last = TRUE, decreasing = FALSE) order(list(...)[EMAIL PROTECTED],na.last=na.last, decreasing=decreasing)) does not help either as it won't be taken up, order still calls the default one, what am I doing wrong? Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: John Chambers [mailto:[EMAIL PROTECTED] Sent: 09 September 2008 15:11 To: Sklyar, Oleg (London) Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel No definitive answers, but here are a few observations. In the call to order() code, I notice that you have dropped into the branch if (any(unlist(lapply(z, is.object where the alternative in your case would seem to have been going directly to the internal code. You can consider a method for xtfrm(), which would help but won't get you completely back to a trivial computation. Alternatively, order() should be eligible for the new mechanism of defining methods for (Individual existing methods may not be the issue, and one can't infer anything definite from the evidence given, but a plausible culprit is the [ method. Because [] expressions appear so often, it's always chancy to define a nontrivial method for this function.) John Sklyar, Oleg (London) wrote: Hello everybody, it looks like the presense of some (do know know which) S4 methods for a given S4 class degrades the performance of xtfrm (used in 'order' in new R-devel) by a factor of millions. This is for classes that ARE derived from numeric directly and thus should be quite trivial to convert to numeric. Consider the following example: setClass(TimeDateBase, representation(numeric, mode=character), prototype(mode=posix) ) setClass(TimeDate, representation(TimeDateBase, tzone=character), prototype(tzone=London) ) x = new(TimeDate, 1220966224 + runif(1e5)) system.time({ z = order(x) }) ## system.time({ z = order(x) }) ## user system elapsed ## 0.048 0.000 0.048 getClass(TimeDate) ## Class TimeDate ## Slots: ## Name: .Data tzone mode ## Class: numeric character character ## Extends: ## Class TimeDateBase, directly ## Class numeric, by class TimeDateBase, distance 2 ## Class vector, by class TimeDateBase, distance 3 Now, if I load a library that not only defines these same classes, but also a bunch of methods for those, then I have the following result: library(AHLCalendar) x = now() + runif(1e5) ## just random times in POSIXct format x[1:5] ## TimeDate [posix] object in 'Europe/London' of length 5: ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721 ## [5] 2008-09-09 14:19:35.657 system.time({ z = order(x) }) Enter a frame number, or 0 to exit 1: system.time({ 2: order(x) 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x) 4: FUN(X[[1]], ...) 5: xtfrm(x) 6: xtfrm.default(x) 7: as.vector(rank(x, ties.method = min, na.last = keep)) 8: rank(x, ties.method = min, na.last = keep) 9: switch(ties.method, average = , min = , max = .Internal(rank(x[!nas], ties. 10: .gt(c(1220966375.21811, 1220966375.67217, 1220966375.51470, 1220966375.7211 11: x[j] 12: x[j
Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel
Sklyar, Oleg (London) wrote: Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a method). However, it won't be taken up by order as it is not in the imported namespace, so order falls back to xtfrm.default. By method you mean generic? xtfrm is an S3 generic. I'm not clear what happens if you define an S3 method for it. Yes, there is a problem defining an S4 generic it would be good to deal with that. Nontrivial, however. Moreover, defining order (which is not a method unfortunately, *any chance of changing this*?): setGeneric(order) setMethod(order, TimeDate, function (..., na.last = TRUE, decreasing = FALSE) order(list(...)[EMAIL PROTECTED],na.last=na.last, decreasing=decreasing)) does not help either as it won't be taken up, order still calls the default one, what am I doing wrong? I'm skeptical that this is true. I did a simple example: setClass(foo, contains = numeric, representation(flag = logical)) [1] foo xx = new(foo, rnorm(5)) setGeneric(order, sig = ...) Creating a generic for order in package .GlobalEnv (the supplied definition differs from and overrides the implicit generic in package base: Signatures differ: (...), (na.last, decreasing)) [1] order setMethod(order, foo, function (..., na.last = TRUE, decreasing = FALSE){message(Method called); order([EMAIL PROTECTED])}) [1] order order(xx) Method called [1] 2 4 3 1 5 You do need to be calling order() directly from one of your functions, and have it in your namespace, if your package has one. Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: John Chambers [mailto:[EMAIL PROTECTED] Sent: 09 September 2008 15:11 To: Sklyar, Oleg (London) Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel No definitive answers, but here are a few observations. In the call to order() code, I notice that you have dropped into the branch if (any(unlist(lapply(z, is.object where the alternative in your case would seem to have been going directly to the internal code. You can consider a method for xtfrm(), which would help but won't get you completely back to a trivial computation. Alternatively, order() should be eligible for the new mechanism of defining methods for (Individual existing methods may not be the issue, and one can't infer anything definite from the evidence given, but a plausible culprit is the [ method. Because [] expressions appear so often, it's always chancy to define a nontrivial method for this function.) John Sklyar, Oleg (London) wrote: Hello everybody, it looks like the presense of some (do know know which) S4 methods for a given S4 class degrades the performance of xtfrm (used in 'order' in new R-devel) by a factor of millions. This is for classes that ARE derived from numeric directly and thus should be quite trivial to convert to numeric. Consider the following example: setClass(TimeDateBase, representation(numeric, mode=character), prototype(mode=posix) ) setClass(TimeDate, representation(TimeDateBase, tzone=character), prototype(tzone=London) ) x = new(TimeDate, 1220966224 + runif(1e5)) system.time({ z = order(x) }) ## system.time({ z = order(x) }) ## user system elapsed ## 0.048 0.000 0.048 getClass(TimeDate) ## Class TimeDate ## Slots: ## Name: .Data tzone mode ## Class: numeric character character ## Extends: ## Class TimeDateBase, directly ## Class numeric, by class TimeDateBase, distance 2 ## Class vector, by class TimeDateBase, distance 3 Now, if I load a library that not only defines these same classes, but also a bunch of methods for those, then I have the following result: library(AHLCalendar) x = now() + runif(1e5) ## just random times in POSIXct format x[1:5] ## TimeDate [posix] object in 'Europe/London' of length 5: ## [1] 2008-09-09 14:19:35.218 2008-09-09 14:19:35.672 ## [3] 2008-09-09 14:19:35.515 2008-09-09 14:19:35.721 ## [5] 2008-09-09 14:19:35.657 system.time({ z = order(x) }) Enter a frame number, or 0 to exit 1: system.time({ 2: order(x) 3: lapply(z, function(x) if (is.object(x)) xtfrm(x) else x) 4: FUN(X[[1]], ...) 5: xtfrm(x) 6: xtfrm.default(x) 7: as.vector(rank(x, ties.method = min, na.last = keep)) 8: rank(x, ties.method = min, na.last = keep) 9: switch(ties.method, average = , min
Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel
Aha, it works if I do setGeneric(order, signature=...) However the problem with that is that it generates a warning which I cannot suppress on install: Creating a generic for order in package AHLCalendar (the supplied definition differs from and overrides the implicit generic in package base: Signatures differ: (...), (na.last, decreasing)) and it generates a warning about masking order from base on load: AHLCalendar [0.2.42] (9 Sep 2008). ?AHLCalendar or vignette('AHLCalendar') to get started Attaching package: 'AHLCalendar' The following object(s) are masked from package:base : order The package exports (excerpt): exportPattern(^[^\\.]) exportMethods(order) The reason for these messages is that the signature is different and I particularly dislike the masking thing (as I cannot predict if it leads to other problems somewhere). As I understand the current dotsMethods does not allow mixing dots and other types, so I cannot really define a matching signature. Is that right? Is there a way around it? As for the rest, yes, I meant generic and it works nicely for xtfrm. But as I wrote later, the problem is in 'rank' and rank is not generic so defining a method would not help in calling a different implementation. Thanks, Oleg Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: John Chambers [mailto:[EMAIL PROTECTED] Sent: 09 September 2008 16:42 To: Sklyar, Oleg (London) Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel Sklyar, Oleg (London) wrote: Ha, defined xtfrm for TimeDate, works instantly (xtfrm is already a method). However, it won't be taken up by order as it is not in the imported namespace, so order falls back to xtfrm.default. By method you mean generic? xtfrm is an S3 generic. I'm not clear what happens if you define an S3 method for it. Yes, there is a problem defining an S4 generic it would be good to deal with that. Nontrivial, however. Moreover, defining order (which is not a method unfortunately, *any chance of changing this*?): setGeneric(order) setMethod(order, TimeDate, function (..., na.last = TRUE, decreasing = FALSE) order(list(...)[EMAIL PROTECTED],na.last=na.last, decreasing=decreasing)) does not help either as it won't be taken up, order still calls the default one, what am I doing wrong? I'm skeptical that this is true. I did a simple example: setClass(foo, contains = numeric, representation(flag = logical)) [1] foo xx = new(foo, rnorm(5)) setGeneric(order, sig = ...) Creating a generic for order in package .GlobalEnv (the supplied definition differs from and overrides the implicit generic in package base: Signatures differ: (...), (na.last, decreasing)) [1] order setMethod(order, foo, function (..., na.last = TRUE, decreasing = FALSE){message(Method called); order([EMAIL PROTECTED])}) [1] order order(xx) Method called [1] 2 4 3 1 5 You do need to be calling order() directly from one of your functions, and have it in your namespace, if your package has one. Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 [EMAIL PROTECTED] -Original Message- From: John Chambers [mailto:[EMAIL PROTECTED] Sent: 09 September 2008 15:11 To: Sklyar, Oleg (London) Cc: R-devel@r-project.org Subject: Re: [Rd] 'xtfrm' performance (influences 'order' performance) in R devel No definitive answers, but here are a few observations. In the call to order() code, I notice that you have dropped into the branch if (any(unlist(lapply(z, is.object where the alternative in your case would seem to have been going directly to the internal code. You can consider a method for xtfrm(), which would help but won't get you completely back to a trivial computation. Alternatively, order() should be eligible for the new mechanism of defining methods for (Individual existing methods may not be the issue, and one can't infer anything definite from the evidence given, but a plausible culprit is the [ method. Because [] expressions appear so often, it's always chancy to define a nontrivial method for this function.) John