I think the first time is potentially much slower because of a garbage collection. R-devel has a flag `gcFirst' for system.time() which (I think) forces a garbage collection before timing.

-roger

Patrick Connolly wrote:
I tried the code that Richard O'Keefe posted last week, to wit:

library(chron)
    ymd.to.POSIXlt <-
        function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
    n <- 100000
    y <- sample(1970:2004, n, replace=TRUE)
    m <- sample(1:12,      n, replace=TRUE)
    d <- sample(1:28,      n, replace=TRUE)
    system.time(ymd.to.POSIXlt(y, m, d))
    [1]  8.78  0.10 31.76  0.00  0.00
    system.time(as.POSIXlt(paste(y,m,d, sep="-")))
    [1] 14.64  0.13 53.30  0.00  0.00


On a somewhat newer machine, I got

$ R --vanilla

R : Copyright 2004, The R Foundation for Statistical Computing
Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3

[...]



library(chron)
   ymd.to.POSIXlt <-

+ function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))

   n <- 100000
   y <- sample(1970:2004, n, replace=TRUE)
   m <- sample(1:12,      n, replace=TRUE)
   d <- sample(1:28,      n, replace=TRUE)

system.time(ymd.to.POSIXlt(y, m, d))

[1] 1.67 0.24 2.01 0.00 0.00

system.time(as.POSIXlt(paste(y,m,d, sep="-")))

[1] 3.06 0.02 3.08 0.00 0.00


But then I tried a few more times...


system.time(ymd.to.POSIXlt(y, m, d))

[1] 1.09 0.04 1.13 0.00 0.00

system.time(ymd.to.POSIXlt(y, m, d))

[1] 1.11 0.09 1.20 0.00 0.00


The second time is a lot faster, but subsequent ones don't "improve further". ' But with the "standard" function,


system.time(as.POSIXlt(paste(y,m,d, sep="-")))

[1] 2.64 0.02 2.66 0.00 0.00

system.time(as.POSIXlt(paste(y,m,d, sep="-")))

[1] 2.82 0.03 2.85 0.00 0.00

... it does improve slightly but rather a lot less.


THEN

If I compare the two methods in the reverse order,


$ R --vanilla

R : Copyright 2004, The R Foundation for Statistical Computing
Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3

[....]



library(chron)
   ymd.to.POSIXlt <-

+ function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))

   n <- 100000
   y <- sample(1970:2004, n, replace=TRUE)
   m <- sample(1:12,      n, replace=TRUE)
   d <- sample(1:28,      n, replace=TRUE)
system.time(as.POSIXlt(paste(y,m,d, sep="-")))

[1] 3.66 0.02 3.76 0.00 0.00

system.time(ymd.to.POSIXlt(y, m, d))

[1] 1.65 0.05 1.70 0.00 0.00


system.time(as.POSIXlt(paste(y,m,d, sep="-")))

[1] 2.59 0.02 2.61 0.00 0.00

system.time(as.POSIXlt(paste(y,m,d, sep="-")))

[1] 2.73 0.00 2.74 0.00 0.00

system.time(ymd.to.POSIXlt(y, m, d))

[1] 1.29 0.01 1.30 0.00 0.00

system.time(ymd.to.POSIXlt(y, m, d))

[1] 0.94 0.00 0.94 0.00 0.00

system.time(ymd.to.POSIXlt(y, m, d))

[1] 1.06 0.01 1.07 0.00 0.00



It seems as though the first simulation makes it "easier" for
subsequent simulations of the same type AND also for simulations of a
somewhat different type also.  The degree to which it "helps" varies
according to just what is being run (no surprise there).  What I can't
figure out is what is happening that makes it quicker for second and
subsequent runs.

I even tried doing a gc() and setting seeds before each run to make a
more direct comparison, but it made no difference other than being
slightly less variable.  I have seen a similar phenomenon in other
types of simulations.

In the case of this code, it makes no difference whether n is 100 or
10000000.  Would that be attibutable to lazy evaluation?



version

_ platform i686-pc-linux-gnu
arch i686 os linux-gnu system i686, linux-gnu status major 1 minor 9.0 year 2004 month 04 day 12 language R



It's not exactly a problem, but it could have a bearing on comparing processing times which is something that happens from time to time. In the comparison that gave rise to the code above, the order would have made a substantial difference to the perceived effectiveness of Richard's code.



-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to