Re: [Rd] Subsetting the "ROW"s of an object

Hervé Pagès Fri, 08 Jun 2018 12:18:32 -0700

A missing subscript is still preferable to a TRUE though because it
carries the meaning "take it all". A TRUE also achieves this but via
implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE]
achieve the same thing (if length(x) != 0) and are both no-ops but
the subsetting code gets a chance to immediately and easily detect
the former as a no-op whereas it will probably not be able to do it
so easily for the latter. So in this case it will most likely generate
a copy of 'x' and fill the new array by taking a full walk on it.


H.

On 06/08/2018 11:52 AM, Hadley Wickham wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <ccbe...@ucsd.edu> wrote:

On Jun 8, 2018, at 10:37 AM, Hervé Pagès <hpa...@fredhutch.org> wrote:

Also the TRUEs cause problems if some dimensions are 0:

  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
    (subscript) logical subscript too long


OK. But this is easy enough to handle.


H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley



AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[i,,,,drop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
      function(x, i, useLiteral=FALSE)
{
     literal <- quote(x[i,,,,drop=FALSE])
     mc <- quote(x[i])
     nd <- max(1L, length(dim(x)))
     mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
     mc[["drop"]] <- FALSE
     if (useLiteral)
         eval(literal)
     else
         eval(mc)
  }

I get identical times with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))


I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
   arr[i, TRUE, TRUE, TRUE],
   arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expression        min    mean   median      max  n_gc
#>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2

So not a huge difference, but it's there.

Hadley


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Subsetting the "ROW"s of an object

Reply via email to