While doing some performance testing with the new version of pqR (see pqR-project.org), I've encountered an extreme, and quite unnecessary, inefficiency in the current R Core implementation of R, which I think you might want to correct.
The inefficiency is in access to columns of a data frame, as in expressions such as df$col[i], which I think are very common (the alternatives of df[i,"col"] and df[["col"]][i] are, I think, less common). Here is the setup for an example showing the issue: L <- list (abc=1:9, xyz=11:19) Lc <- L; class(Lc) <- "glub" df <- data.frame(L) And here are some times for R-3.5.2 (r-devel of 2019-02-01 is much the same): > system.time (for (i in 1:1000000) r <- L$xyz) user system elapsed 0.086 0.004 0.089 > system.time (for (i in 1:1000000) r <- Lc$xyz) user system elapsed 0.494 0.000 0.495 > system.time (for (i in 1:1000000) r <- df$xyz) user system elapsed 3.425 0.000 3.426 So accessing a column of a data frame is 38 times slower than accessing a list element (which is what happens in the underlying implementation of a data frame), and 7 times slower than accessing an element of a list with a class attribute (for which it's necessary to check whether there is a $.glub method, which there isn't here). For comparison, here are the times for pqR-2019-01-25: > system.time (for (i in 1:1000000) r <- L$xyz) user system elapsed 0.057 0.000 0.058 > system.time (for (i in 1:1000000) r <- Lc$xyz) user system elapsed 0.251 0.000 0.251 > system.time (for (i in 1:1000000) r <- df$xyz) user system elapsed 0.247 0.000 0.247 So when accessing df$xyz, R-3.5.2 is 14 times slower than pqR-2019-01-25. (For a partial match, like df$xy, R-3.5.2 is 34 times slower.) I wasn't surprised that pqR was faster, but I didn't expect this big a difference. Then I remembered having seen a NEWS item from R-3.1.0: * Partial matching when using the $ operator _on data frames_ now throws a warning and may become defunct in the future. If partial matching is intended, replace foo$bar by foo[["bar", exact = FALSE]]. and having looked at the code then: `$.data.frame` <- function(x,name) { a <- x[[name]] if (!is.null(a)) return(a) a <- x[[name, exact=FALSE]] if (!is.null(a)) warning("Name partially matched in data frame") return(a) } I recall thinking at the time that this involved a pretty big performance hit, compared to letting the primitive $ operator do it, just to produce a warning. But it wasn't until now that I noticed this NEWS in R-3.1.1: * The warning when using partial matching with the $ operator on data frames is now only given when options("warnPartialMatchDollar") is TRUE. for which the code was changed to: `$.data.frame` <- function(x,name) { a <- x[[name]] if (!is.null(a)) return(a) a <- x[[name, exact=FALSE]] if (!is.null(a) && getOption("warnPartialMatchDollar", default=FALSE)) { names <- names(x) warning(gettextf("Partial match of '%s' to '%s' in data frame", name, names[pmatch(name, names)])) } return(a) } One can see the effect now when warnPartialMatchDollar is enabled: > options(warnPartialMatchDollar=TRUE) > Lc$xy [1] 11 12 13 14 15 16 17 18 19 Warning message: In Lc$xy : partial match of 'xy' to 'xyz' > df$xy [1] 11 12 13 14 15 16 17 18 19 Warning message: In `$.data.frame`(df, xy) : Partial match of 'xy' to 'xyz' in data frame So the only thing that slowing down acesses like df$xyz by a factor of seven achieves now is to add the words "in data frame" to the warning message (while making the earlier part of the message less intelligible). I think you might want to just delete the definition of $.data.frame, reverting to the situation before R-3.1.0. Radford Neal ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel