Did you read the help page?

       x: numeric vector whose sample quantiles are wanted.  ‘NA’ and
          ‘NaN’ values are not allowed unless ‘na.rm’ is ‘TRUE’.

so only 'numeric' vectors are really supported, although it does say

     The default method does not allow factors, but works with objects
     sufficiently like numeric vectors that ‘sort’, addition and
     multiplication work correctly.  In principle only sorts and
     weighted means are needed, so datatimes could have quantiles - but
     this is not implemented.

There is no claim that it works (let alone works well) for class "difftime". If you follow the link to 'sort' it says

     The default ‘sort’ method makes use of ‘order’ for objects with
     classes, which in turn makes use of the generic function ‘xtfrm’.

and from ?xtfrm

     The default method will make use of ‘==’ and ‘>’ methods for the
     class of ‘x[i]’ (for integers ‘i’), and the ‘is.na’ method for the
     class of ‘x’, but might be rather slow when doing so.

So, if you want this to be fast, you need to write an xtfrm method. There is one in R-devel

xtfrm.difftime
function (x)
as.numeric(x)

and you can use that in your workspace (and your example is fast in R-devel, because of that function I think, there being other development work in progress in the version I tried).


On Fri, 27 Nov 2009, [email protected] wrote:

Full_Name: Hong Ooi
Version: 2.10.0
OS: Windows XP
Submission from: (NULL) (203.110.235.1)


While trying to get summary statistics on a duration variable (the difference
between a start and end date), I ran into the following issue. Using summary or
quantile (which summary calls) on a difftime object takes an extremely long time
if the object is even moderately large.

A reproducible example:

x <- as.Date(1:10000, origin="1900-01-01")
x[1:10]
[1] "1900-01-02" "1900-01-03" "1900-01-04" "1900-01-05" "1900-01-06"
[6] "1900-01-07" "1900-01-08" "1900-01-09" "1900-01-10" "1900-01-11"
d <- x - as.Date("1900-01-01")
d[1:10]
Time differences in days
[1]  1  2  3  4  5  6  7  8  9 10
system.time(summary(d[1:10]))
  user  system elapsed
  0.01    0.00    0.01
system.time(summary(d[1:100]))
  user  system elapsed
  0.21    0.00    0.20
system.time(summary(d[1:1000]))
  user  system elapsed
  3.02    0.00    3.02
system.time(summary(d[1:10000]))
  user  system elapsed
 43.56    0.04   43.66


If I unclass d, there is no problem:

system.time(summary(unclass(d[1:10000])))
  user  system elapsed
     0       0       0

Testing with Rprof() indicates that the problem lies in [.difftime, although the
code for that function seems innocuous enough.


sessionInfo()
R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  [email protected]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to