Currently, in function 'logicalSubscript' in subscript.c, the case of no recycling is handled like the implentation of R function 'which'. It passes through the data only once, but uses more memory. It is since R 3.3.0. For the case of recycling, two passes are done, first to get number of elements in the result.
Also since R 3.3.0, function 'makeSubscript' in subscript.c doesn't call 'duplicate' on logical index vector. A side note: I guess that it is safe not to call 'duplicate' on logical index vector, even if it is the one being modified in subassignment, because it is converted to positive indices before use in extraction or replacement. If so, isn't it true for character index vector as well? Here are examples of subsetting a numeric vector of length 10^8 with logical index vector, inspired by Hong Ooi's answer in https://stackoverflow.com/questions/17510778/why-is-subsetting-on-a-logical-type-slower-than-subsetting-on-numeric-type . I presents two extreme cases, each with no-recycling and recycling versions that convert to the same positive indices. Difference between the two versions can be attributed to function 'logicalSubscript'. Example 1: select none x <- numeric(1e8) i <- rep(FALSE, length(x))# no reycling system.time(x[i]) system.time(x[i]) i <- FALSE# recycling system.time(x[i]) system.time(x[i]) Output: user system elapsed 0.083 0.000 0.083 user system elapsed 0.085 0.000 0.085 user system elapsed 0.144 0.000 0.144 user system elapsed 0.143 0.000 0.144 Example 2: select all x <- numeric(1e8) i <- rep(TRUE, length(x))# no reycling system.time(x[i]) system.time(x[i]) i <- TRUE# recycling system.time(x[i]) system.time(x[i]) Output: user system elapsed 0.538 0.741 1.292 user system elapsed 0.506 0.668 1.175 user system elapsed 0.448 0.534 0.986 user system elapsed 0.431 0.528 0.960 The results were from R 3.3.2 on http://rextester.com/l/r_online_compiler . The no-recycling version took longer time than the recycling version for example 2, where more time was taken in both versions. Function 'logicalSubscript' in subscript.c in R 3.2.x also use a faster code for the case of no recycling, but does two passes in all cases. Treatment for the case of recycling is identical with current code. Function 'logicalSubscript' in subscript.c affects subsetting with negative indices, because negative indices are converted first to a logical index vector with the same length as the vector (no recycling). Example, comparing times of x[-1] and its equivalent, x[2:length(x)] : x <- numeric(1e8) system.time(x[-1]) system.time(x[-1]) system.time(x[2:length(x)]) system.time(x[2:length(x)]) Output from R 3.3.2 on http://rextester.com/l/r_online_compiler : user system elapsed 0.591 0.903 1.515 user system elapsed 0.558 0.822 1.384 user system elapsed 0.620 0.659 1.285 user system elapsed 0.607 0.663 1.274 Output from R 3.2.2 in Zenppelin Notebook, https://my.datascientistworkbench.com/tools/zeppelin-notebook/ : user system elapsed 1.156 1.636 2.794 user system elapsed 0.884 1.528 2.413 user system elapsed 0.932 1.544 2.476 user system elapsed 0.932 1.584 2.519 From above, apparently, x[-1] consistently took longer time than x[2:length(x)] with R 3.3.2, but not with R 3.2.2. So, how about reverting to R 3.2.x code of function 'logicalSubscript' in subscript.c? ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel