On Thu, 17 Apr 2008, Herve Pages wrote: > Hi, > > Thanks for your answers! > > No need to change anything. In my case, 'x' is guaranteed to be an integer > vector with no NAs so I can call .Internal(is.unsorted(x)) directly. > > BTW, why not make is.unsorted() a little bit more prepared to silly user > input:
Because R is a volunteer project and resources spent on trapping misuse are resources not available to be spent on other things. (Same as for bug reports on fixed issues ....) > > is.unsorted(c(2:5, NA), na.rm=NA) > Error in if (!is.atomic(x) || (!na.rm && any(is.na(x)))) return(NA) : > missing value where TRUE/FALSE needed > > (or at least silently coerce those silly values to TRUE or FALSE like > max()/min() do, following some obscure logic though). > > Also it's arguable that a length-1 vector cannot be considered sorted: > > is.unsorted(NA) > [1] NA > > Cheers, > H. > > > Prof Brian Ripley wrote: >> I wouldn't say 'easy', and so I think we need a business case for this >> change. (One of the issues is that the internals are used elsewhere and >> optimized for inputs without NAs. So we would need to write separate code >> if we pass NAs down to C level. As I recall, is.unsorted was a cheap R >> interface to existing C code.) >> >> What real-world problems are being affected by this, and what would be >> proportional speedup of the whole analysis be from this change? >> >> On Thu, 17 Apr 2008, Bill Dunlap wrote: >> >>> On Thu, 17 Apr 2008, Herve Pages wrote: >>> >>>> Couldn't is.unsorted() bail out immediately here (after comparing >>>> the first 2 elements): >>>> >>>> > x <- 20000000:1 >>>> > system.time(is.unsorted(x), gcFirst=TRUE) >>>> user system elapsed >>>> 0.084 0.040 0.124 >>>> >>>> > x <- 200000000:1 >>>> > system.time(is.unsorted(x), gcFirst=TRUE) >>>> user system elapsed >>>> 0.772 0.440 1.214 >>> >>> The C code does bail out upon seeing the first out- of-order pair, but >>> before calling the C code, the S code does any(is.na(x)), forcing a >>> scan of the entire data. If you remove the is.na calls from >>> is.unsorted's S code you will see the timings improve in your example. >>> (It looks easy to do the NA checks in the C code.) >>> >>> is.unsorted.no.nacheck <- function (x, na.rm = FALSE) { >>> if (is.null(x)) >>> return(FALSE) >>> if (!is.atomic(x)) >>> return(NA) >>> .Internal(is.unsorted(x)) >>> } >>> > x <- 20000000:1 >>> > system.time(is.unsorted(x), gcFirst=TRUE) >>> user system elapsed >>> 0.356 0.157 0.514 >>> > system.time(is.unsorted.no.nacheck(x), gcFirst=TRUE) >>> user system elapsed >>> 0 0 0 >>> > revx <- rev(x) >>> > system.time(is.unsorted(revx), gcFirst=TRUE) >>> user system elapsed >>> 0.500 0.170 0.672 >>> > system.time(is.unsorted.no.nacheck(revx),gcFirst=TRUE) >>> user system elapsed >>> 0.131 0.000 0.132 >>> >>> >>> ---------------------------------------------------------------------------- >>> >>> Bill Dunlap >>> Insightful Corporation >>> bill at insightful dot com >>> 360-428-8146 >>> >>> "All statements in this message represent the opinions of the author and >>> do >>> not necessarily reflect Insightful Corporation policy or position." >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel