Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()
Indeed. as_date is from lubridate, but the same holds for as.Date. The output and it's interpretation should be consistent, otherwise it leads to confusion when programming. I understand that the difference exists after asking a question on Stack Overflow: https://stackoverflow.com/q/50766089/914686 This understanding is never mentioned in the documentation - that an Inf date is actually represented as NA: https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/as.Date So I'm of the impression that the display should be fixed as a first option (thereby providing clarity/transparency in terms of back-end and output), or the documentation amended (to highlight this) as a second option. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
> On Jun 8, 2018, at 2:15 PM, Hadley Wickham wrote: > > On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles wrote: >> >> >>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham wrote: >>> >>> Hmmm, yes, there must be some special case in the C code to avoid >>> recycling a length-1 logical vector: >> >> >> Here is a version that (I think) handles Herve's issue of arrays having one >> or more 0 dimensions. >> >> subset_ROW <- >>function(x,i) >> { >>dims <- dim(x) >>index_list <- which(dims[-1] != 0L) + 3 >>mc <- quote(x[i]) >>nd <- max(1L, length(dims)) >>mc[ index_list ] <- list(TRUE) >>mc[[ nd + 3L ]] <- FALSE >>names( mc )[ nd+3L ] <- "drop" >>eval(mc) >> } >> >> Curiously enough the timing is *much* better for this implementation than >> for the first version I sent. >> >> Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be >> done with `alist(a=)' in place of `list(TRUE)' in the earlier version but >> seems to slow things down noticeably. It requires almost twice (!!) as much >> time as the version above. > > I think that's probably because alist() is a slow way to generate a > missing symbol: > > bench::mark( > alist(x = ), > list(x = quote(expr = )), > check = FALSE > )[1:5] > #> # A tibble: 2 x 5 > #> expressionmin mean median max > #> > #> 1 alist(x = ) 2.8µs 3.54µs 3.29µs 34.9µs > #> 2 list(x = quote(expr = ))169ns 219.38ns181ns 24.2µs > > (note the units) Yes. That is good for about half the difference. And I guess the rest is getting rid of seq(). This seems a bit quicker than anything else and satisfies Herve's objections: subset_ROW <- function(x,i) { dims <- dim(x) nd <- length(dims) index_list <- if (nd > 1) 2L + 2L:nd else 0 mc <- quote(x[i]) mc[ index_list ] <- list(quote(expr=)) mc[[ "drop" ]] <- FALSE eval(mc) } Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On 06/08/2018 02:15 PM, Hadley Wickham wrote: On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles wrote: On Jun 8, 2018, at 1:49 PM, Hadley Wickham wrote: Hmmm, yes, there must be some special case in the C code to avoid recycling a length-1 logical vector: Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions. subset_ROW <- function(x,i) { dims <- dim(x) index_list <- which(dims[-1] != 0L) + 3 mc <- quote(x[i]) nd <- max(1L, length(dims)) mc[ index_list ] <- list(TRUE) mc[[ nd + 3L ]] <- FALSE names( mc )[ nd+3L ] <- "drop" eval(mc) } Curiously enough the timing is *much* better for this implementation than for the first version I sent. Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above. I think that's probably because alist() is a slow way to generate a missing symbol: bench::mark( alist(x = ), list(x = quote(expr = )), check = FALSE )[1:5] #> # A tibble: 2 x 5 #> expressionmin mean median max #> #> 1 alist(x = ) 2.8µs 3.54µs 3.29µs 34.9µs #> 2 list(x = quote(expr = ))169ns 219.38ns181ns 24.2µs (note the units) That's a good one. Need to change this in S4Vectors::default_extractROWS() and other places. Thanks! H. Hadley -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles wrote: > > >> On Jun 8, 2018, at 1:49 PM, Hadley Wickham wrote: >> >> Hmmm, yes, there must be some special case in the C code to avoid >> recycling a length-1 logical vector: > > > Here is a version that (I think) handles Herve's issue of arrays having one > or more 0 dimensions. > > subset_ROW <- > function(x,i) > { > dims <- dim(x) > index_list <- which(dims[-1] != 0L) + 3 > mc <- quote(x[i]) > nd <- max(1L, length(dims)) > mc[ index_list ] <- list(TRUE) > mc[[ nd + 3L ]] <- FALSE > names( mc )[ nd+3L ] <- "drop" > eval(mc) > } > > Curiously enough the timing is *much* better for this implementation than for > the first version I sent. > > Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be > done with `alist(a=)' in place of `list(TRUE)' in the earlier version but > seems to slow things down noticeably. It requires almost twice (!!) as much > time as the version above. I think that's probably because alist() is a slow way to generate a missing symbol: bench::mark( alist(x = ), list(x = quote(expr = )), check = FALSE )[1:5] #> # A tibble: 2 x 5 #> expressionmin mean median max #> #> 1 alist(x = ) 2.8µs 3.54µs 3.29µs 34.9µs #> 2 list(x = quote(expr = ))169ns 219.38ns181ns 24.2µs (note the units) Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
> On Jun 8, 2018, at 1:49 PM, Hadley Wickham wrote: > > Hmmm, yes, there must be some special case in the C code to avoid > recycling a length-1 logical vector: Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions. subset_ROW <- function(x,i) { dims <- dim(x) index_list <- which(dims[-1] != 0L) + 3 mc <- quote(x[i]) nd <- max(1L, length(dims)) mc[ index_list ] <- list(TRUE) mc[[ nd + 3L ]] <- FALSE names( mc )[ nd+3L ] <- "drop" eval(mc) } Curiously enough the timing is *much* better for this implementation than for the first version I sent. Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above. Best, Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()
> as_date Error: object 'as_date' not found Must be from some not-named package... But don't confuse the format of an object when printed with its underlying value: > as.Date(Inf,origin = '1970-01-01') [1] NA > str(as.Date(Inf,origin = '1970-01-01')) Date[1:1], format: NA > as.numeric(as.Date(Inf,origin = '1970-01-01')) [1] Inf > is.na(Inf) [1] FALSE > is.na(as.Date(Inf,origin = '1970-01-01')) [1] FALSE > str(as.Date(27,origin = '1970-01-01')) Date[1:1], format: "1970-01-28" > as.numeric(as.Date(27,origin = '1970-01-01')) [1] 27 -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 6/8/18, 1:02 PM, "R-devel on behalf of Werner Grundlingh" wrote: In the following example, the date class shows Inf as NA > as_date(Inf, origin = '1970-01-01') [1] NA This is misleading as is.na() reports incorrectly > is.na(as_date(Inf, origin = '1970-01-01')) [1] FALSE The correct approach here would probably to have an Inf (and -Inf) *displayed* rather than NA. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
The C code for subsetting doesn't need to recycle a logical subscript. It only needs to walk on it and start again at the beginning of the vector when it reaches the end. Not exactly the same as detecting the "take everything along that dimension" situation though. x[TRUE, TRUE, TRUE] triggers the full subsetting machinery when x[] and x[ , , ] could (and should) easily avoid it. H. On 06/08/2018 01:49 PM, Hadley Wickham wrote: Hmmm, yes, there must be some special case in the C code to avoid recycling a length-1 logical vector: dims <- c(4, 4, 4, 1e5) arr <- array(rnorm(prod(dims)), dims) dim(arr) #> [1] 4 4 4 10 i <- c(1, 3) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] )[c("expression", "min", "mean", "max")] #> # A tibble: 2 x 4 #> expressionmin mean max #> #> 1 arr[i, TRUE, TRUE, TRUE] 41.8ms 43.6ms 46.5ms #> 2 arr[i, , , ] 41.7ms 43.1ms 46.3ms On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles wrote: On Jun 8, 2018, at 11:52 AM, Hadley Wickham wrote: On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: Also the TRUEs cause problems if some dimensions are 0: matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long OK. But this is easy enough to handle. H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley AFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[idrop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[idrop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) I think that's because you used a relatively low precision timing mechnaism, and included the index generation in the timing. I see: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length = 10, by = 100) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] ) #> # A tibble: 2 x 1 #> expressionminmean median max n_gc #> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 So not a huge difference, but it's there. Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent. But with subset_ROW4 I see no consistent difference. In this example, it runs faster on average using `eval(mc)' to return the result: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length=10,by=100) bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8] # A tibble: 2 x 8 expression min mean median max `itr/sec` mem_alloc n_gc 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms28686. 5.05KB 5 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs 32.4µs 875.11µs28572. 5.05KB 5 And on subsequent reps the lead switches back and forth. Chuck -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
Actually, it's sort of the opposite. Everything becomes a sequence of integers internally, even when the argument is missing. So the same amount of work is done, basically. ALTREP will let us improve this sort of thing. Michael On Fri, Jun 8, 2018 at 1:49 PM, Hadley Wickham wrote: > Hmmm, yes, there must be some special case in the C code to avoid > recycling a length-1 logical vector: > > dims <- c(4, 4, 4, 1e5) > > arr <- array(rnorm(prod(dims)), dims) > dim(arr) > #> [1] 4 4 4 10 > i <- c(1, 3) > > bench::mark( > arr[i, TRUE, TRUE, TRUE], > arr[i, , , ] > )[c("expression", "min", "mean", "max")] > #> # A tibble: 2 x 4 > #> expressionmin mean max > #> > #> 1 arr[i, TRUE, TRUE, TRUE] 41.8ms 43.6ms 46.5ms > #> 2 arr[i, , , ] 41.7ms 43.1ms 46.3ms > > > On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles wrote: >> >> >>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham wrote: >>> >>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: > On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: > > Also the TRUEs cause problems if some dimensions are 0: > >> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] > Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : > (subscript) logical subscript too long OK. But this is easy enough to handle. > > H. > > On 06/08/2018 10:29 AM, Hadley Wickham wrote: >> I suspect this will have suboptimal performance since the TRUEs will >> get recycled. (Maybe there is, or could be, ALTREP, support for >> recycling) >> Hadley AFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[idrop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[idrop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) >>> >>> I think that's because you used a relatively low precision timing >>> mechnaism, and included the index generation in the timing. I see: >>> >>> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >>> i <- seq(1,length = 10, by = 100) >>> >>> bench::mark( >>> arr[i, TRUE, TRUE, TRUE], >>> arr[i, , , ] >>> ) >>> #> # A tibble: 2 x 1 >>> #> expressionminmean median max n_gc >>> #> >>> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 >>> #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 >>> >>> So not a huge difference, but it's there. >> >> >> Funny. I get similar results to yours above albeit with smaller differences. >> Usually < 5 percent. >> >> But with subset_ROW4 I see no consistent difference. >> >> In this example, it runs faster on average using `eval(mc)' to return the >> result: >> >>> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >>> i <- seq(1,length=10,by=100) >>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8] >> # A tibble: 2 x 8 >> expression min mean median max `itr/sec` >> mem_alloc n_gc >> >> >> 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms28686. >> 5.05KB 5 >> 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs 32.4µs 875.11µs28572. >> 5.05KB 5 >>> >> >> And on subsequent reps the lead switches back and forth. >> >> >> Chuck >> > > > > -- > http://hadley.nz > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
Hmmm, yes, there must be some special case in the C code to avoid recycling a length-1 logical vector: dims <- c(4, 4, 4, 1e5) arr <- array(rnorm(prod(dims)), dims) dim(arr) #> [1] 4 4 4 10 i <- c(1, 3) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] )[c("expression", "min", "mean", "max")] #> # A tibble: 2 x 4 #> expressionmin mean max #> #> 1 arr[i, TRUE, TRUE, TRUE] 41.8ms 43.6ms 46.5ms #> 2 arr[i, , , ] 41.7ms 43.1ms 46.3ms On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles wrote: > > >> On Jun 8, 2018, at 11:52 AM, Hadley Wickham wrote: >> >> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: >>> >>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: Also the TRUEs cause problems if some dimensions are 0: > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long >>> >>> OK. But this is easy enough to handle. >>> H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: > I suspect this will have suboptimal performance since the TRUEs will > get recycled. (Maybe there is, or could be, ALTREP, support for > recycling) > Hadley >>> >>> >>> AFAICS, it is not an issue. Taking >>> >>> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >>> >>> as a test case >>> >>> and using a function that will either use the literal code >>> `x[idrop=FALSE]' or `eval(mc)': >>> >>> subset_ROW4 <- >>> function(x, i, useLiteral=FALSE) >>> { >>>literal <- quote(x[idrop=FALSE]) >>>mc <- quote(x[i]) >>>nd <- max(1L, length(dim(x))) >>>mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) >>>mc[["drop"]] <- FALSE >>>if (useLiteral) >>>eval(literal) >>>else >>>eval(mc) >>> } >>> >>> I get identical times with >>> >>> system.time(for (i in 1:1) >>> subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) >>> >>> and with >>> >>> system.time(for (i in 1:1) >>> subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) >> >> I think that's because you used a relatively low precision timing >> mechnaism, and included the index generation in the timing. I see: >> >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> i <- seq(1,length = 10, by = 100) >> >> bench::mark( >> arr[i, TRUE, TRUE, TRUE], >> arr[i, , , ] >> ) >> #> # A tibble: 2 x 1 >> #> expressionminmean median max n_gc >> #> >> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 >> #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 >> >> So not a huge difference, but it's there. > > > Funny. I get similar results to yours above albeit with smaller differences. > Usually < 5 percent. > > But with subset_ROW4 I see no consistent difference. > > In this example, it runs faster on average using `eval(mc)' to return the > result: > >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> i <- seq(1,length=10,by=100) >> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8] > # A tibble: 2 x 8 > expression min mean median max `itr/sec` > mem_alloc n_gc > > > 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms28686. > 5.05KB 5 > 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs 32.4µs 875.11µs28572. > 5.05KB 5 >> > > And on subsequent reps the lead switches back and forth. > > > Chuck > -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Date class shows Inf as NA; this confuses the use of is.na()
In the following example, the date class shows Inf as NA > as_date(Inf, origin = '1970-01-01') [1] NA This is misleading as is.na() reports incorrectly > is.na(as_date(Inf, origin = '1970-01-01')) [1] FALSE The correct approach here would probably to have an Inf (and -Inf) *displayed* rather than NA. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
> On Jun 8, 2018, at 11:52 AM, Hadley Wickham wrote: > > On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: >> >> >>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: >>> >>> Also the TRUEs cause problems if some dimensions are 0: >>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] >>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >>> (subscript) logical subscript too long >> >> OK. But this is easy enough to handle. >> >>> >>> H. >>> >>> On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley >> >> >> AFAICS, it is not an issue. Taking >> >> arr <- array(rnorm(2^22),c(2^10,4,4,4)) >> >> as a test case >> >> and using a function that will either use the literal code >> `x[idrop=FALSE]' or `eval(mc)': >> >> subset_ROW4 <- >> function(x, i, useLiteral=FALSE) >> { >>literal <- quote(x[idrop=FALSE]) >>mc <- quote(x[i]) >>nd <- max(1L, length(dim(x))) >>mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) >>mc[["drop"]] <- FALSE >>if (useLiteral) >>eval(literal) >>else >>eval(mc) >> } >> >> I get identical times with >> >> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) >> >> and with >> >> system.time(for (i in 1:1) >> subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) > > I think that's because you used a relatively low precision timing > mechnaism, and included the index generation in the timing. I see: > > arr <- array(rnorm(2^22),c(2^10,4,4,4)) > i <- seq(1,length = 10, by = 100) > > bench::mark( > arr[i, TRUE, TRUE, TRUE], > arr[i, , , ] > ) > #> # A tibble: 2 x 1 > #> expressionminmean median max n_gc > #> > #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 > #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 > > So not a huge difference, but it's there. Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent. But with subset_ROW4 I see no consistent difference. In this example, it runs faster on average using `eval(mc)' to return the result: > arr <- array(rnorm(2^22),c(2^10,4,4,4)) > i <- seq(1,length=10,by=100) > bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8] # A tibble: 2 x 8 expression min mean median max `itr/sec` mem_alloc n_gc 1 subset_ROW4(arr, i, FALSE) 28.9µs 34.9µs 32.1µs 1.36ms28686. 5.05KB 5 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs 32.4µs 875.11µs28572. 5.05KB 5 > And on subsequent reps the lead switches back and forth. Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
A missing subscript is still preferable to a TRUE though because it carries the meaning "take it all". A TRUE also achieves this but via implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE] achieve the same thing (if length(x) != 0) and are both no-ops but the subsetting code gets a chance to immediately and easily detect the former as a no-op whereas it will probably not be able to do it so easily for the latter. So in this case it will most likely generate a copy of 'x' and fill the new array by taking a full walk on it. H. On 06/08/2018 11:52 AM, Hadley Wickham wrote: On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: Also the TRUEs cause problems if some dimensions are 0: > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long OK. But this is easy enough to handle. H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley AFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[idrop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[idrop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) I think that's because you used a relatively low precision timing mechnaism, and included the index generation in the timing. I see: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length = 10, by = 100) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] ) #> # A tibble: 2 x 1 #> expressionminmean median max n_gc #> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 So not a huge difference, but it's there. Hadley -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles wrote: > > >> On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: >> >> Also the TRUEs cause problems if some dimensions are 0: >> >> > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] >> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >>(subscript) logical subscript too long > > OK. But this is easy enough to handle. > >> >> H. >> >> On 06/08/2018 10:29 AM, Hadley Wickham wrote: >>> I suspect this will have suboptimal performance since the TRUEs will >>> get recycled. (Maybe there is, or could be, ALTREP, support for >>> recycling) >>> Hadley > > > AFAICS, it is not an issue. Taking > > arr <- array(rnorm(2^22),c(2^10,4,4,4)) > > as a test case > > and using a function that will either use the literal code > `x[idrop=FALSE]' or `eval(mc)': > > subset_ROW4 <- > function(x, i, useLiteral=FALSE) > { > literal <- quote(x[idrop=FALSE]) > mc <- quote(x[i]) > nd <- max(1L, length(dim(x))) > mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) > mc[["drop"]] <- FALSE > if (useLiteral) > eval(literal) > else > eval(mc) > } > > I get identical times with > > system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) > > and with > > system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) I think that's because you used a relatively low precision timing mechnaism, and included the index generation in the timing. I see: arr <- array(rnorm(2^22),c(2^10,4,4,4)) i <- seq(1,length = 10, by = 100) bench::mark( arr[i, TRUE, TRUE, TRUE], arr[i, , , ] ) #> # A tibble: 2 x 1 #> expressionminmean median max n_gc #> #> 1 arr[i, TRUE,… 7.4µs 10.9µs 10.66µs 1.22ms 2 #> 2 arr[i, , , ] 7.06µs 8.8µs 7.85µs 538.09µs 2 So not a huge difference, but it's there. Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
> On Jun 8, 2018, at 10:37 AM, Hervé Pagès wrote: > > Also the TRUEs cause problems if some dimensions are 0: > > > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] > Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : >(subscript) logical subscript too long OK. But this is easy enough to handle. > > H. > > On 06/08/2018 10:29 AM, Hadley Wickham wrote: >> I suspect this will have suboptimal performance since the TRUEs will >> get recycled. (Maybe there is, or could be, ALTREP, support for >> recycling) >> Hadley AFAICS, it is not an issue. Taking arr <- array(rnorm(2^22),c(2^10,4,4,4)) as a test case and using a function that will either use the literal code `x[idrop=FALSE]' or `eval(mc)': subset_ROW4 <- function(x, i, useLiteral=FALSE) { literal <- quote(x[idrop=FALSE]) mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L) mc[["drop"]] <- FALSE if (useLiteral) eval(literal) else eval(mc) } I get identical times with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE)) and with system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE)) Changing the dimensions to c(2^5, 2^7, 4, 4 ) and running something similar also shows equal times. Chuck >> On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles wrote: >>> >>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? >>> >>> >>> You can use TRUE to fill the subscripts for dimensions 2:nd >>> subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } >>> >>> >>> subset_ROW <- >>> function(x,i) >>> { >>> mc <- quote(x[i]) >>> nd <- max(1L, length(dim(x))) >>> mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L) >>> mc[["drop"]] <- FALSE >>> eval(mc) >>> >>> } >>> subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 >>> >>> HTH, >>> >>> Chuck >>> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
Also the TRUEs cause problems if some dimensions are 0: > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE] Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] : (subscript) logical subscript too long H. On 06/08/2018 10:29 AM, Hadley Wickham wrote: I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles wrote: On Jun 8, 2018, at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? You can use TRUE to fill the subscripts for dimensions 2:nd subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW <- function(x,i) { mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L) mc[["drop"]] <- FALSE eval(mc) } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 HTH, Chuck -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On 06/08/2018 10:32 AM, Hervé Pagès wrote: On 06/08/2018 10:15 AM, Michael Lawrence wrote: There probably should be an abstraction for this. In S4Vectors, we have extractROWS(). FWIW the code in S4Vectors that does what your subset_ROW() does is: https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_S4Vectors_blob_04cc9516af986b30445e99fd1337f13321b7b4f6_R_subsetting-2Dutils.R-23L466-2DL476=DwIFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=LnDTzOeXwI6VI-4SVVi2rwDE7A-az-AhxPAB6X7Lkhc=_2PVGd2BrNNHtPjGsJkhSLAmtX3eoFuZDWWs2c8zZ4w= Wrong link sorry. Here is the correct one: https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L453-L464 H. (This is the default "extractROWS" method.) Except for the normalization of 'i', it does the same as your subset_ROW(). I don't know how to do this without generating a call with missing arguments. H. Michael On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 It seems like there should be a way to do this that doesn't require generating a call with missing arguments, but I can't think of it. Thanks! Hadley -- https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
On 06/08/2018 10:15 AM, Michael Lawrence wrote: There probably should be an abstraction for this. In S4Vectors, we have extractROWS(). FWIW the code in S4Vectors that does what your subset_ROW() does is: https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L466-L476 (This is the default "extractROWS" method.) Except for the normalization of 'i', it does the same as your subset_ROW(). I don't know how to do this without generating a call with missing arguments. H. Michael On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham wrote: Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 It seems like there should be a way to do this that doesn't require generating a call with missing arguments, but I can't think of it. Thanks! Hadley -- https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518= __ R-devel@r-project.org mailing list https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518= -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
I suspect this will have suboptimal performance since the TRUEs will get recycled. (Maybe there is, or could be, ALTREP, support for recycling) Hadley On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles wrote: > > >> On Jun 8, 2018, at 8:45 AM, Hadley Wickham wrote: >> >> Hi all, >> >> Is there a better to way to subset the ROWs (in the sense of NROW) of >> an vector, matrix, data frame or array than this? > > > You can use TRUE to fill the subscripts for dimensions 2:nd > >> >> subset_ROW <- function(x, i) { >> nd <- length(dim(x)) >> if (nd <= 1L) { >>x[i] >> } else { >>dims <- rep(list(quote(expr = )), nd - 1L) >>do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) >> } >> } > > > subset_ROW <- > function(x,i) > { > mc <- quote(x[i]) > nd <- max(1L, length(dim(x))) > mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L) > mc[["drop"]] <- FALSE > eval(mc) > > } > >> >> subset_ROW(1:10, 4:6) >> #> [1] 4 5 6 >> >> str(subset_ROW(array(1:10, c(10)), 2:4)) >> #> int [1:3(1d)] 2 3 4 >> str(subset_ROW(array(1:10, c(10, 1)), 2:4)) >> #> int [1:3, 1] 2 3 4 >> str(subset_ROW(array(1:10, c(5, 2)), 2:4)) >> #> int [1:3, 1:2] 2 3 4 7 8 9 >> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) >> #> int [1:3, 1, 1] 2 3 4 >> >> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) >> #> x y >> #> 2 2 9 >> #> 3 3 8 >> #> 4 4 7 >> > > HTH, > > Chuck > -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
> On Jun 8, 2018, at 8:45 AM, Hadley Wickham wrote: > > Hi all, > > Is there a better to way to subset the ROWs (in the sense of NROW) of > an vector, matrix, data frame or array than this? You can use TRUE to fill the subscripts for dimensions 2:nd > > subset_ROW <- function(x, i) { > nd <- length(dim(x)) > if (nd <= 1L) { >x[i] > } else { >dims <- rep(list(quote(expr = )), nd - 1L) >do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) > } > } subset_ROW <- function(x,i) { mc <- quote(x[i]) nd <- max(1L, length(dim(x))) mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L) mc[["drop"]] <- FALSE eval(mc) } > > subset_ROW(1:10, 4:6) > #> [1] 4 5 6 > > str(subset_ROW(array(1:10, c(10)), 2:4)) > #> int [1:3(1d)] 2 3 4 > str(subset_ROW(array(1:10, c(10, 1)), 2:4)) > #> int [1:3, 1] 2 3 4 > str(subset_ROW(array(1:10, c(5, 2)), 2:4)) > #> int [1:3, 1:2] 2 3 4 7 8 9 > str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) > #> int [1:3, 1, 1] 2 3 4 > > subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) > #> x y > #> 2 2 9 > #> 3 3 8 > #> 4 4 7 > HTH, Chuck __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
There probably should be an abstraction for this. In S4Vectors, we have extractROWS(). Michael On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham wrote: > Hi all, > > Is there a better to way to subset the ROWs (in the sense of NROW) of > an vector, matrix, data frame or array than this? > > subset_ROW <- function(x, i) { > nd <- length(dim(x)) > if (nd <= 1L) { > x[i] > } else { > dims <- rep(list(quote(expr = )), nd - 1L) > do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) > } > } > > subset_ROW(1:10, 4:6) > #> [1] 4 5 6 > > str(subset_ROW(array(1:10, c(10)), 2:4)) > #> int [1:3(1d)] 2 3 4 > str(subset_ROW(array(1:10, c(10, 1)), 2:4)) > #> int [1:3, 1] 2 3 4 > str(subset_ROW(array(1:10, c(5, 2)), 2:4)) > #> int [1:3, 1:2] 2 3 4 7 8 9 > str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) > #> int [1:3, 1, 1] 2 3 4 > > subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) > #> x y > #> 2 2 9 > #> 3 3 8 > #> 4 4 7 > > It seems like there should be a way to do this that doesn't require > generating a call with missing arguments, but I can't think of it. > > Thanks! > > Hadley > > -- > http://hadley.nz > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
Sorry, without remnants from other attempts: subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) return(x[i]) apply(x, 2:nd, `[`, i, drop=FALSE) } El vie., 8 jun. 2018 a las 19:07, Iñaki Úcar () escribió: > > El vie., 8 jun. 2018 a las 17:46, Hadley Wickham > () escribió: > > > > Hi all, > > > > Is there a better to way to subset the ROWs (in the sense of NROW) of > > an vector, matrix, data frame or array than this? > > > > subset_ROW <- function(x, i) { > > nd <- length(dim(x)) > > if (nd <= 1L) { > > x[i] > > } else { > > dims <- rep(list(quote(expr = )), nd - 1L) > > do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) > > } > > } > > > > subset_ROW(1:10, 4:6) > > #> [1] 4 5 6 > > > > str(subset_ROW(array(1:10, c(10)), 2:4)) > > #> int [1:3(1d)] 2 3 4 > > str(subset_ROW(array(1:10, c(10, 1)), 2:4)) > > #> int [1:3, 1] 2 3 4 > > str(subset_ROW(array(1:10, c(5, 2)), 2:4)) > > #> int [1:3, 1:2] 2 3 4 7 8 9 > > str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) > > #> int [1:3, 1, 1] 2 3 4 > > > > subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) > > #> x y > > #> 2 2 9 > > #> 3 3 8 > > #> 4 4 7 > > > > It seems like there should be a way to do this that doesn't require > > generating a call with missing arguments, but I can't think of it. > > The following code seems to work. The only minor drawback is that, for > the last case, the output is not a data frame. > > subset_ROW <- function(x, i) { > nd <- length(dim(x)) > if (nd <= 1L) > return(x[i]) > xx <- apply(x, 2:nd, `[`, i, drop=FALSE) > dim(xx) <- c(length(i), dim(x)[-1]) > xx > } > > Iñaki > > > > > Thanks! > > > > Hadley > > > > -- > > http://hadley.nz > > -- Iñaki Úcar http://www.enchufa2.es @Enchufa2 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Subsetting the "ROW"s of an object
El vie., 8 jun. 2018 a las 17:46, Hadley Wickham () escribió: > > Hi all, > > Is there a better to way to subset the ROWs (in the sense of NROW) of > an vector, matrix, data frame or array than this? > > subset_ROW <- function(x, i) { > nd <- length(dim(x)) > if (nd <= 1L) { > x[i] > } else { > dims <- rep(list(quote(expr = )), nd - 1L) > do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) > } > } > > subset_ROW(1:10, 4:6) > #> [1] 4 5 6 > > str(subset_ROW(array(1:10, c(10)), 2:4)) > #> int [1:3(1d)] 2 3 4 > str(subset_ROW(array(1:10, c(10, 1)), 2:4)) > #> int [1:3, 1] 2 3 4 > str(subset_ROW(array(1:10, c(5, 2)), 2:4)) > #> int [1:3, 1:2] 2 3 4 7 8 9 > str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) > #> int [1:3, 1, 1] 2 3 4 > > subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) > #> x y > #> 2 2 9 > #> 3 3 8 > #> 4 4 7 > > It seems like there should be a way to do this that doesn't require > generating a call with missing arguments, but I can't think of it. The following code seems to work. The only minor drawback is that, for the last case, the output is not a data frame. subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) return(x[i]) xx <- apply(x, 2:nd, `[`, i, drop=FALSE) dim(xx) <- c(length(i), dim(x)[-1]) xx } Iñaki > > Thanks! > > Hadley > > -- > http://hadley.nz > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Subsetting the "ROW"s of an object
Hi all, Is there a better to way to subset the ROWs (in the sense of NROW) of an vector, matrix, data frame or array than this? subset_ROW <- function(x, i) { nd <- length(dim(x)) if (nd <= 1L) { x[i] } else { dims <- rep(list(quote(expr = )), nd - 1L) do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE))) } } subset_ROW(1:10, 4:6) #> [1] 4 5 6 str(subset_ROW(array(1:10, c(10)), 2:4)) #> int [1:3(1d)] 2 3 4 str(subset_ROW(array(1:10, c(10, 1)), 2:4)) #> int [1:3, 1] 2 3 4 str(subset_ROW(array(1:10, c(5, 2)), 2:4)) #> int [1:3, 1:2] 2 3 4 7 8 9 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4)) #> int [1:3, 1, 1] 2 3 4 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4) #> x y #> 2 2 9 #> 3 3 8 #> 4 4 7 It seems like there should be a way to do this that doesn't require generating a call with missing arguments, but I can't think of it. Thanks! Hadley -- http://hadley.nz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [R-pkg-devel] pkg built with static vignette introduces dependency on R > = 3.5.0
On 08/06/2018 5:12 AM, Roman Flury wrote: Dear all, I'm working on a package which contains a static vignette. If the pkg is built with R version 3.3.3 everything works fine, but if built with the current R-devel version I get the warning: NB: this package now depends on R (>= 3.5.0) WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R. File(s) containing such objects: 'staticvignettepkg/build/vignette.rds' and as described the dependency on R >= 3.5.0 is added to the DESCRIPTION file. I found possible context for this behaviour in the R-devel NEWS https://cran.r-project.org/doc/manuals/r-devel/NEWS.html: ''R has new serialization format (version 3) which supports custom serialization of ALTREP framework objects. These objects can still be serialized in format 2, but less efficiently. Serialization format 3 also records the current native encoding of unflagged strings and converts them when de-serialized in R running under different native encoding. Format 3 comes with new serialization magic numbers (RDA3, RDB3, RDX3). Format 3 can be selected by version = 3 in save(), serialize() and saveRDS(), but format 2 remains the default for all serialization and saving of the workspace. Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.'' but I can not see why or how this should have an influence on a static vignette? To illustrate and reproduce my issue I created a git repository https://github.com/romanflury/staticvignette with a minimal package, containing an arbitrary pdf document as a static vignette. The git repository includes the respective session infos also. I hope to avoid this dependency, since I do not want to force users to update their R version. When R builds a package with vignettes, it adds a file build/vignette.rds to the tarball that contains information about the vignettes. Since R-devel is switching the format of .rds files, this file is in the new format, which can't be read by R versions prior to 3.5.0. Generally speaking there is no guarantee that R x.y.z can handle a package built in a later version, and this is an example of that problem: R x.y.z can't handle a package built in x.(y+2).z. So the solution is to build your tarball in R 3.5.x or earlier, not in R-devel, or to add the dependency mentioned in the warning message. Duncan Murdoch __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] pkg built with static vignette introduces dependency on R > = 3.5.0
Dear all, I'm working on a package which contains a static vignette. If the pkg is built with R version 3.3.3 everything works fine, but if built with the current R-devel version I get the warning: > NB: this package now depends on R (>= 3.5.0) > WARNING: Added dependency on R >= 3.5.0 because serialized objects in > serialize/load version 3 cannot be read in older versions of R. File(s) > containing such objects: 'staticvignettepkg/build/vignette.rds' and as described the dependency on R >= 3.5.0 is added to the DESCRIPTION file. I found possible context for this behaviour in the R-devel NEWS https://cran.r-project.org/doc/manuals/r-devel/NEWS.html: ''R has new serialization format (version 3) which supports custom serialization of ALTREP framework objects. These objects can still be serialized in format 2, but less efficiently. Serialization format 3 also records the current native encoding of unflagged strings and converts them when de-serialized in R running under different native encoding. Format 3 comes with new serialization magic numbers (RDA3, RDB3, RDX3). Format 3 can be selected by version = 3 in save(), serialize() and saveRDS(), but format 2 remains the default for all serialization and saving of the workspace. Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.'' but I can not see why or how this should have an influence on a static vignette? To illustrate and reproduce my issue I created a git repository https://github.com/romanflury/staticvignette with a minimal package, containing an arbitrary pdf document as a static vignette. The git repository includes the respective session infos also. I hope to avoid this dependency, since I do not want to force users to update their R version. Many thanks, Roman __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [Bioc-devel] "Reviving" an existing workflow
Hello Lori, thanks for your support! The workflow package is now on the tracker: https://github.com/Bioconductor/Contributions/issues/765 Cheers, Bernd On Di, 2018-06-05 at 15:33 +, Shepherd, Lori wrote: > Excellent news! The RAM requirement was a legacy from the old > builders and we have since improved the capabilities so you should be > alright with the package as is. I will look at updating the > documentation on the website and thank you for bringing it to our > attention. > > We look forward to your submission on the tracker. > > Cheers, > > Lori Shepherd > Bioconductor Core Team > Roswell Park Cancer Institute > Department of Biostatistics & Bioinformatics > Elm & Carlton Streets > Buffalo, New York 14263 > From: Bernd Klaus > Sent: Tuesday, June 5, 2018 2:53:41 AM > To: Shepherd, Lori; bioc-devel > Cc: Stefanie Reisenauer > Subject: Re: [Bioc-devel] "Reviving" an existing workflow > > Hello Lori, > > following up on this, we have now finished the > revision of the workflow and are ready to submit it to > Bioc: > > https://github.com/Steffireise/maEndToEnd > > However, using a call to gc() at end of the > workflow and rendering it via rmarkdown::render > gives a memory usage of ~ 4.5 GB, > while the guidelines: > > http://bioconductor.org/developers/how-to/workflows/ > > say that "The package should require <= 4GB RAM" > > Is it okay to submit the workflow as it is, or > do we have to look into splitting it up as in: > > http://bioconductor.org/packages/devel/workflows/html/simpleSingleCel > l. > html > > ? > > Thanks and best wishes, > > Bernd > > > On Do, 2018-05-03 at 18:41 +0200, Bernd Klaus wrote: > > Hello Lori, > > > > thanks a lot for your prompt and your constructive feedback! > > > > We will revise the workflow accordingly and then submit it > > to the submission tracker. > > > > Cheers, > > > > Bernd > > > > On Do, 2018-05-03 at 15:50 +, Shepherd, Lori wrote: > > > > > > Hello Bernd, > > > > > > Thanks for reaching out. It looks like the last build on > Jenkin's > > > was two release cycles ago (Oct 2), because of the time lapse and > > > that there have been quite a few changes to depending packages, > and > > > that it was never "published", we would recommend putting the > > > workflow on the submission tracker. > > > > > > We understand that it isn't a new submission, but this will allow > > > the > > > debugging process to be separate from the daily builder. > > > > > > https://github.com/Bioconductor/Contributions > > > > > > > > > Thank you for converting your workflow to a package. Please also > > > review > > > http://bioconductor.org/developers/how-to/workflows/ as it > > > outlines > > > some important aspects to implement that are new from when the > > > workflow was originally submitted. > > > Most Importantly: > > > Add "Workflow: True" as a field in the DESCRIPTION > > > Updating the BiocViews terms to be part of the workflow > > > biocViews: ht > > > tp://bioconductor.org/packages/devel/BiocViews.html#___Workflow > > > See the consistent formatting section. We will require the > > > vignette > > > be updated to use BiocStyle and include author and affiliations, > > > date, and versioning information. > > > > > > > > > I did try to build your package locally and ran into quite a few > > > issues. > > > The first run: > > > Quitting from lines 1510-1512 (MA-Workflow.Rmd) > > > Error: processing vignette 'MA-Workflow.Rmd' failed with > > > diagnostics: > > > package Rgraphviz is required > > > > > > So it seems like you need to add Rgraphviz > > > > > > The second run after installing Rgraphviz: > > > Quitting from lines 1578-1579 (MA-Workflow.Rmd) > > > Error: processing vignette 'MA-Workflow.Rmd' failed with > > > diagnostics: > > > could not find function "enrichMap" > > > > > > I think this stems from the fact that DOSE moved a lot of their > > > plotting functions or the plotting functionality to > the enrichplot > > > package but something that will have to be remedied. > > > > > > I also would git rm the vignette/MA-Workflow.html - this should > be > > > generated automatically from the Rmd file and including a version > > > could result it a stale copy. > > > > > > Just on quick glance of the vignette - It seems like you set.seed > > > in > > > the vignette but that code is not exposed to the user - assuming > > > this > > > would be important to show so that a user could reproduce your > > > work. > > > > > > We look forward to getting this workflow active again and on the > > > builder. > > > > > > Cheers, > > > > > > Lori Shepherd > > > Bioconductor Core Team > > > Roswell Park Cancer Institute > > > Department of Biostatistics & Bioinformatics > > > Elm & Carlton Streets > > > Buffalo, New York 14263 > > > From: Bioc-devel on behalf of > > > Bernd Klaus > > > Sent: Thursday, May 3, 2018 3:03:20 AM > > > To: bioc-devel > > > Subject: [Bioc-devel] "Reviving" an existing