Suharto, If you're interested in performance with subscripting, you might want to look at pqR (pqR-project.org). It has some substantial performance improvements for subscripting over R Core versions. This is especially true for the current development version of pqR (probably leading to a new release in about a month).
You can look at a somewhat-stable snapshot of recent pqR development at https://github.com/radfordneal/pqR/tree/05e32fa6 In particular, src/main/subscript.c might be of interest. Note that you should read mods-dir/README if you want to build this, and in particular, you need to run create-configure in the top-level source directory first. I modified your tests a bit, including producing versions using both vectors of length 1e8 like you did (which will not fit in cache) and vectors of length 1e5 (which will fit in at least the L3 cache). I ran tests on an Intel Skylake processor (E3-1270v5 @ 3.6GHz), using gcc 7.2 with -O3 -march=native -mtune=native. I got the following results with R-3.4.2 (with R_ENABLE_JIT=0, which is slightly faster than using the JIT compiler): R-3.4.2, LARGE VECTORS: > N <- 1e8; R <- 5 > #N <- 1e5; R <- 1000 > > x <- numeric(N) > i <- rep(FALSE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.296 0.000 0.297 > i <- FALSE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.416 0.000 0.418 > > x <- numeric(N) > i <- rep(TRUE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 1.416 0.352 1.773 > i <- TRUE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 1.348 0.264 1.613 > > x <- numeric(N) > system.time(for (r in 1:R) a <- x[-1]) user system elapsed 1.516 0.376 1.895 > system.time(for (r in 1:R) a <- x[2:length(x)]) user system elapsed 1.516 0.308 1.827 > > v <- 2:length(x) > system.time(for (r in 1:R) a <- x[v]) user system elapsed 1.416 0.268 1.689 R-3.4.2, SMALL VECTORS: > #N <- 1e8; R <- 5 > N <- 1e5; R <- 1000 > > x <- numeric(N) > i <- rep(FALSE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.088 0.000 0.089 > i <- FALSE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.084 0.000 0.084 > > x <- numeric(N) > i <- rep(TRUE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.492 0.020 0.515 > i <- TRUE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.408 0.008 0.420 > > x <- numeric(N) > system.time(for (r in 1:R) a <- x[-1]) user system elapsed 0.508 0.004 0.516 > system.time(for (r in 1:R) a <- x[2:length(x)]) user system elapsed 0.464 0.008 0.473 > > v <- 2:length(x) > system.time(for (r in 1:R) a <- x[v]) user system elapsed 0.428 0.000 0.428 Here are the results with the development version of pqR (uncompressed pointers, no byte compilation): pqR (devel), LARGE VECTORS: > N <- 1e8; R <- 5 > #N <- 1e5; R <- 1000 > > x <- numeric(N) > i <- rep(FALSE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.192 0.000 0.193 > i <- FALSE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.436 0.000 0.434 > > x <- numeric(N) > i <- rep(TRUE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.768 0.216 0.988 > i <- TRUE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.832 0.272 1.105 > > x <- numeric(N) > system.time(for (r in 1:R) a <- x[-1]) user system elapsed 0.280 0.156 0.435 > system.time(for (r in 1:R) a <- x[2:length(x)]) user system elapsed 0.252 0.184 0.436 > > v <- 2:length(x) > system.time(for (r in 1:R) a <- x[v]) user system elapsed 0.828 0.168 0.998 pqR (devel), SMALL VECTORS: > #N <- 1e8; R <- 5 > N <- 1e5; R <- 1000 > > x <- numeric(N) > i <- rep(FALSE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.040 0.000 0.038 > i <- FALSE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.084 0.000 0.087 > > x <- numeric(N) > i <- rep(TRUE, length(x))# no reycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.156 0.036 0.192 > i <- TRUE# recycling > system.time(for (r in 1:R) a <- x[i]) user system elapsed 0.184 0.012 0.195 > > x <- numeric(N) > system.time(for (r in 1:R) a <- x[-1]) user system elapsed 0.060 0.012 0.075 > system.time(for (r in 1:R) a <- x[2:length(x)]) user system elapsed 0.052 0.024 0.075 > > v <- 2:length(x) > system.time(for (r in 1:R) a <- x[v]) user system elapsed 0.180 0.004 0.182 Summarizing elapsed times: LARGE VECTORS T1 T2 T3 T4 T5 T6 T7 R-3.4.2: 0.297 0.418 1.773 1.613 1.895 1.827 1.689 pqR dev: 0.193 0.434 0.988 1.105 0.435 0.436 0.998 SMALL VECTORS T1 T2 T3 T4 T5 T6 T7 R-3.4.2: 0.089 0.084 0.515 0.420 0.516 0.473 0.428 pqR dev: 0.038 0.087 0.192 0.195 0.075 0.075 0.182 As one can see, pqR is substantially faster for all except T2 (where it's about the same). The very large advantage of pqR on T5 and T6 is partly because pqR has special code for efficiently handling things like x[-1] and x[2:length(x)], so I added the x[v] test to see what performance is like when this special handling isn't invoked. There's no particular reason pqR's code for these operations couldn't be adapted for use in the R Core implementaton, though there are probably a few issues involving large vectors, and the special handling of x[2:length(x)] would require implementing pqR's internal "variant result" mechanism. pqR also has much faster code for some other subset and subset assignment operations. Radford Neal ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel