Currently, in function 'logicalSubscript' in subscript.c, the case of no 
recycling is handled like the implentation of R function 'which'. It passes 
through the data only once, but uses more memory. It is since R 3.3.0. For the 
case of recycling, two passes are done, first to get number of elements in the 
result.

Also since R 3.3.0, function 'makeSubscript' in subscript.c doesn't call 
'duplicate' on logical index vector.

A side note: I guess that it is safe not to call 'duplicate' on logical index 
vector, even if it is the one being modified in subassignment, because it is 
converted to positive indices before use in extraction or replacement. If so, 
isn't it true for character index vector as well?

Here are examples of subsetting a numeric vector of length 10^8 with logical 
index vector, inspired by Hong Ooi's answer in 
https://stackoverflow.com/questions/17510778/why-is-subsetting-on-a-logical-type-slower-than-subsetting-on-numeric-type
 . I presents two extreme cases, each with no-recycling and recycling versions 
that convert to the same positive indices. Difference between the two versions 
can be attributed to function 'logicalSubscript'.

Example 1: select none
x <- numeric(1e8)
i <- rep(FALSE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- FALSE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed 
  0.083   0.000   0.083 
   user  system elapsed 
  0.085   0.000   0.085 
   user  system elapsed 
  0.144   0.000   0.144 
   user  system elapsed 
  0.143   0.000   0.144 

Example 2: select all
x <- numeric(1e8)
i <- rep(TRUE, length(x))# no reycling
system.time(x[i])
system.time(x[i])
i <- TRUE# recycling
system.time(x[i])
system.time(x[i])

Output:
   user  system elapsed 
  0.538   0.741   1.292 
   user  system elapsed 
  0.506   0.668   1.175 
   user  system elapsed 
  0.448   0.534   0.986 
   user  system elapsed 
  0.431   0.528   0.960 

The results were from R 3.3.2 on http://rextester.com/l/r_online_compiler . The 
no-recycling version took longer time than the recycling version for example 2, 
where more time was taken in both versions.

Function 'logicalSubscript' in subscript.c in R 3.2.x also use a faster code 
for the case of no recycling, but does two passes in all cases. Treatment for 
the case of recycling is identical with current code.

Function 'logicalSubscript' in subscript.c affects subsetting with negative 
indices, because negative indices are converted first to a logical index vector 
with the same length as the vector (no recycling).

Example, comparing times of x[-1] and its equivalent, x[2:length(x)] :
x <- numeric(1e8)
system.time(x[-1])
system.time(x[-1])
system.time(x[2:length(x)])
system.time(x[2:length(x)])

Output from R 3.3.2 on http://rextester.com/l/r_online_compiler :
   user  system elapsed 
  0.591   0.903   1.515 
   user  system elapsed 
  0.558   0.822   1.384 
   user  system elapsed 
  0.620   0.659   1.285 
   user  system elapsed 
  0.607   0.663   1.274 

Output from R 3.2.2 in Zenppelin Notebook, 
https://my.datascientistworkbench.com/tools/zeppelin-notebook/ :
user  system elapsed 
  1.156   1.636   2.794 
   user  system elapsed 
  0.884   1.528   2.413 
   user  system elapsed 
  0.932   1.544   2.476 
   user  system elapsed 
  0.932   1.584   2.519

From above, apparently, x[-1] consistently took longer time than x[2:length(x)] 
with R 3.3.2, but not with R 3.2.2.

So, how about reverting to R 3.2.x code of function 'logicalSubscript' in 
subscript.c?

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to