Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-08 Thread Werner Grundlingh
Indeed. as_date is from lubridate, but the same holds for as.Date.

The output and it's interpretation should be consistent, otherwise it leads
to confusion when programming. I understand that the difference exists
after asking a question on Stack Overflow:
  https://stackoverflow.com/q/50766089/914686
This understanding is never mentioned in the documentation - that an Inf
date is actually represented as NA:
  https://www.rdocumentation.org/packages/base/versions/3.5.0/topics/as.Date
So I'm of the impression that the display should be fixed as a first option
(thereby providing clarity/transparency in terms of back-end and output),
or the documentation amended (to highlight this) as a second option.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Berry, Charles


> On Jun 8, 2018, at 2:15 PM, Hadley Wickham  wrote:
> 
> On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles  wrote:
>> 
>> 
>>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:
>>> 
>>> Hmmm, yes, there must be some special case in the C code to avoid
>>> recycling a length-1 logical vector:
>> 
>> 
>> Here is a version that (I think) handles Herve's issue of arrays having one 
>> or more 0 dimensions.
>> 
>> subset_ROW <-
>>function(x,i)
>> {
>>dims <- dim(x)
>>index_list <- which(dims[-1] != 0L) + 3
>>mc <- quote(x[i])
>>nd <- max(1L, length(dims))
>>mc[ index_list ] <- list(TRUE)
>>mc[[ nd + 3L ]] <- FALSE
>>names( mc )[ nd+3L ] <- "drop"
>>eval(mc)
>> }
>> 
>> Curiously enough the timing is *much* better for this implementation than 
>> for the first version I sent.
>> 
>> Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be 
>> done with `alist(a=)' in place of `list(TRUE)' in the earlier version but 
>> seems to slow things down noticeably. It requires almost twice (!!) as much 
>> time as the version above.
> 
> I think that's probably because alist() is a slow way to generate a
> missing symbol:
> 
> bench::mark(
>  alist(x = ),
>  list(x = quote(expr = )),
>  check = FALSE
> )[1:5]
> #> # A tibble: 2 x 5
> #>   expressionmin mean   median  max
> #>  
> #> 1 alist(x = ) 2.8µs   3.54µs   3.29µs   34.9µs
> #> 2 list(x = quote(expr = ))169ns 219.38ns181ns   24.2µs
> 
> (note the units)

Yes. That is good for about half the difference. And I guess the rest is 
getting rid of seq(). This seems a bit quicker than anything else and satisfies 
Herve's objections:

subset_ROW <-
  function(x,i)
  {
  dims <- dim(x)
  nd <- length(dims)
  index_list <- if (nd > 1) 2L + 2L:nd else 0
  mc <- quote(x[i])
  mc[ index_list ] <- list(quote(expr=))
  mc[[ "drop" ]] <- FALSE
  eval(mc)
  }

Chuck
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès




On 06/08/2018 02:15 PM, Hadley Wickham wrote:

On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles  wrote:




On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:



Here is a version that (I think) handles Herve's issue of arrays having one or 
more 0 dimensions.

subset_ROW <-
 function(x,i)
{
 dims <- dim(x)
 index_list <- which(dims[-1] != 0L) + 3
 mc <- quote(x[i])
 nd <- max(1L, length(dims))
 mc[ index_list ] <- list(TRUE)
 mc[[ nd + 3L ]] <- FALSE
 names( mc )[ nd+3L ] <- "drop"
 eval(mc)
}

Curiously enough the timing is *much* better for this implementation than for 
the first version I sent.

Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be done 
with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to 
slow things down noticeably. It requires almost twice (!!) as much time as the 
version above.


I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
   alist(x = ),
   list(x = quote(expr = )),
   check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expressionmin mean   median  max
#>  
#> 1 alist(x = ) 2.8µs   3.54µs   3.29µs   34.9µs
#> 2 list(x = quote(expr = ))169ns 219.38ns181ns   24.2µs

(note the units)


That's a good one. Need to change this in S4Vectors::default_extractROWS()
and other places. Thanks!

H.



Hadley




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:
>>
>> Hmmm, yes, there must be some special case in the C code to avoid
>> recycling a length-1 logical vector:
>
>
> Here is a version that (I think) handles Herve's issue of arrays having one 
> or more 0 dimensions.
>
> subset_ROW <-
> function(x,i)
> {
> dims <- dim(x)
> index_list <- which(dims[-1] != 0L) + 3
> mc <- quote(x[i])
> nd <- max(1L, length(dims))
> mc[ index_list ] <- list(TRUE)
> mc[[ nd + 3L ]] <- FALSE
> names( mc )[ nd+3L ] <- "drop"
> eval(mc)
> }
>
> Curiously enough the timing is *much* better for this implementation than for 
> the first version I sent.
>
> Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be 
> done with `alist(a=)' in place of `list(TRUE)' in the earlier version but 
> seems to slow things down noticeably. It requires almost twice (!!) as much 
> time as the version above.

I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
  alist(x = ),
  list(x = quote(expr = )),
  check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expressionmin mean   median  max
#>  
#> 1 alist(x = ) 2.8µs   3.54µs   3.29µs   34.9µs
#> 2 list(x = quote(expr = ))169ns 219.38ns181ns   24.2µs

(note the units)

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Berry, Charles



> On Jun 8, 2018, at 1:49 PM, Hadley Wickham  wrote:
> 
> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:


Here is a version that (I think) handles Herve's issue of arrays having one or 
more 0 dimensions.

subset_ROW <-
function(x,i)
{
dims <- dim(x)
index_list <- which(dims[-1] != 0L) + 3
mc <- quote(x[i])
nd <- max(1L, length(dims))
mc[ index_list ] <- list(TRUE)
mc[[ nd + 3L ]] <- FALSE
names( mc )[ nd+3L ] <- "drop"
eval(mc)
}

Curiously enough the timing is *much* better for this implementation than for 
the first version I sent.

Constructing a version of `mc' that looks like `x[idrop=FALSE]' can be done 
with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to 
slow things down noticeably. It requires almost twice (!!) as much time as the 
version above.

Best,

Chuck
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-08 Thread MacQueen, Don via R-devel
> as_date
Error: object 'as_date' not found

Must be from some not-named package...

But don't confuse the format of an object when printed with its underlying 
value:

> as.Date(Inf,origin = '1970-01-01')
[1] NA

> str(as.Date(Inf,origin = '1970-01-01'))
 Date[1:1], format: NA

> as.numeric(as.Date(Inf,origin = '1970-01-01'))
[1] Inf

> is.na(Inf)
[1] FALSE

> is.na(as.Date(Inf,origin = '1970-01-01'))
[1] FALSE

> str(as.Date(27,origin = '1970-01-01'))
 Date[1:1], format: "1970-01-28"

> as.numeric(as.Date(27,origin = '1970-01-01'))
[1] 27

--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

On 6/8/18, 1:02 PM, "R-devel on behalf of Werner Grundlingh" 
 wrote:

In the following example, the date class shows Inf as NA

> as_date(Inf, origin = '1970-01-01')
[1] NA

This is misleading as is.na() reports incorrectly

> is.na(as_date(Inf, origin = '1970-01-01'))
[1] FALSE

The correct approach here would probably to have an Inf (and -Inf)
*displayed* rather than NA.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

The C code for subsetting doesn't need to recycle a logical subscript.
It only needs to walk on it and start again at the beginning of the
vector when it reaches the end. Not exactly the same as detecting the
"take everything along that dimension" situation though.
x[TRUE, TRUE, TRUE] triggers the full subsetting machinery when x[]
and x[ , , ] could (and should) easily avoid it.

H.

On 06/08/2018 01:49 PM, Hadley Wickham wrote:

Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]  4  4  4 10
i <- c(1, 3)

bench::mark(
   arr[i, TRUE, TRUE, TRUE],
   arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expressionmin mean  max
#> 
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]   41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles  wrote:




On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:




On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:

Also the TRUEs cause problems if some dimensions are 0:


matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]

Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
   (subscript) logical subscript too long


OK. But this is easy enough to handle.



H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley



AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[idrop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
 function(x, i, useLiteral=FALSE)
{
literal <- quote(x[idrop=FALSE])
mc <- quote(x[i])
nd <- max(1L, length(dim(x)))
mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
mc[["drop"]] <- FALSE
if (useLiteral)
eval(literal)
else
eval(mc)
}

I get identical times with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))


I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.



Funny. I get similar results to yours above albeit with smaller differences. 
Usually < 5 percent.

But with subset_ROW4 I see no consistent difference.

In this example, it runs faster on average using `eval(mc)' to return the 
result:


arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length=10,by=100)
bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]

# A tibble: 2 x 8
   expression  min mean   median  max `itr/sec` 
mem_alloc  n_gc
   

1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.
5.05KB 5
2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.
5.05KB 5




And on subsequent reps the lead switches back and forth.


Chuck







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Michael Lawrence
Actually, it's sort of the opposite. Everything becomes a sequence of
integers internally, even when the argument is missing. So the same
amount of work is done, basically. ALTREP will let us improve this
sort of thing.

Michael

On Fri, Jun 8, 2018 at 1:49 PM, Hadley Wickham  wrote:
> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:
>
> dims <- c(4, 4, 4, 1e5)
>
> arr <- array(rnorm(prod(dims)), dims)
> dim(arr)
> #> [1]  4  4  4 10
> i <- c(1, 3)
>
> bench::mark(
>   arr[i, TRUE, TRUE, TRUE],
>   arr[i, , , ]
> )[c("expression", "min", "mean", "max")]
> #> # A tibble: 2 x 4
> #>   expressionmin mean  max
> #> 
> #> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
> #> 2 arr[i, , , ]   41.7ms   43.1ms   46.3ms
>
>
> On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles  wrote:
>>
>>
>>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:
>>>
>>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:


> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>
> Also the TRUEs cause problems if some dimensions are 0:
>
>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>   (subscript) logical subscript too long

 OK. But this is easy enough to handle.

>
> H.
>
> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>> I suspect this will have suboptimal performance since the TRUEs will
>> get recycled. (Maybe there is, or could be, ALTREP, support for
>> recycling)
>> Hadley


 AFAICS, it is not an issue. Taking

 arr <- array(rnorm(2^22),c(2^10,4,4,4))

 as a test case

 and using a function that will either use the literal code 
 `x[idrop=FALSE]' or `eval(mc)':

 subset_ROW4 <-
 function(x, i, useLiteral=FALSE)
 {
literal <- quote(x[idrop=FALSE])
mc <- quote(x[i])
nd <- max(1L, length(dim(x)))
mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
mc[["drop"]] <- FALSE
if (useLiteral)
eval(literal)
else
eval(mc)
 }

 I get identical times with

 system.time(for (i in 1:1) 
 subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

 and with

 system.time(for (i in 1:1) 
 subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>>
>>> I think that's because you used a relatively low precision timing
>>> mechnaism, and included the index generation in the timing. I see:
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length = 10, by = 100)
>>>
>>> bench::mark(
>>>  arr[i, TRUE, TRUE, TRUE],
>>>  arr[i, , , ]
>>> )
>>> #> # A tibble: 2 x 1
>>> #>   expressionminmean   median  max  n_gc
>>> #>
>>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
>>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2
>>>
>>> So not a huge difference, but it's there.
>>
>>
>> Funny. I get similar results to yours above albeit with smaller differences. 
>> Usually < 5 percent.
>>
>> But with subset_ROW4 I see no consistent difference.
>>
>> In this example, it runs faster on average using `eval(mc)' to return the 
>> result:
>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length=10,by=100)
>>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
>> # A tibble: 2 x 8
>>   expression  min mean   median  max `itr/sec` 
>> mem_alloc  n_gc
>>  
>>  
>> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.   
>>  5.05KB 5
>> 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.   
>>  5.05KB 5
>>>
>>
>> And on subsequent reps the lead switches back and forth.
>>
>>
>> Chuck
>>
>
>
>
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]  4  4  4 10
i <- c(1, 3)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expressionmin mean  max
#> 
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]   41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:
>>
>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>>>
>>>
 On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:

 Also the TRUEs cause problems if some dimensions are 0:

> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
 Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
   (subscript) logical subscript too long
>>>
>>> OK. But this is easy enough to handle.
>>>

 H.

 On 06/08/2018 10:29 AM, Hadley Wickham wrote:
> I suspect this will have suboptimal performance since the TRUEs will
> get recycled. (Maybe there is, or could be, ALTREP, support for
> recycling)
> Hadley
>>>
>>>
>>> AFAICS, it is not an issue. Taking
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>
>>> as a test case
>>>
>>> and using a function that will either use the literal code 
>>> `x[idrop=FALSE]' or `eval(mc)':
>>>
>>> subset_ROW4 <-
>>> function(x, i, useLiteral=FALSE)
>>> {
>>>literal <- quote(x[idrop=FALSE])
>>>mc <- quote(x[i])
>>>nd <- max(1L, length(dim(x)))
>>>mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>mc[["drop"]] <- FALSE
>>>if (useLiteral)
>>>eval(literal)
>>>else
>>>eval(mc)
>>> }
>>>
>>> I get identical times with
>>>
>>> system.time(for (i in 1:1) 
>>> subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>
>>> and with
>>>
>>> system.time(for (i in 1:1) 
>>> subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>
>> I think that's because you used a relatively low precision timing
>> mechnaism, and included the index generation in the timing. I see:
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length = 10, by = 100)
>>
>> bench::mark(
>>  arr[i, TRUE, TRUE, TRUE],
>>  arr[i, , , ]
>> )
>> #> # A tibble: 2 x 1
>> #>   expressionminmean   median  max  n_gc
>> #>
>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2
>>
>> So not a huge difference, but it's there.
>
>
> Funny. I get similar results to yours above albeit with smaller differences. 
> Usually < 5 percent.
>
> But with subset_ROW4 I see no consistent difference.
>
> In this example, it runs faster on average using `eval(mc)' to return the 
> result:
>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length=10,by=100)
>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
> # A tibble: 2 x 8
>   expression  min mean   median  max `itr/sec` 
> mem_alloc  n_gc
>  
>  
> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.
> 5.05KB 5
> 2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.
> 5.05KB 5
>>
>
> And on subsequent reps the lead switches back and forth.
>
>
> Chuck
>



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Date class shows Inf as NA; this confuses the use of is.na()

2018-06-08 Thread Werner Grundlingh
In the following example, the date class shows Inf as NA

> as_date(Inf, origin = '1970-01-01')
[1] NA

This is misleading as is.na() reports incorrectly

> is.na(as_date(Inf, origin = '1970-01-01'))
[1] FALSE

The correct approach here would probably to have an Inf (and -Inf)
*displayed* rather than NA.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Berry, Charles


> On Jun 8, 2018, at 11:52 AM, Hadley Wickham  wrote:
> 
> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>> 
>> 
>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>>> 
>>> Also the TRUEs cause problems if some dimensions are 0:
>>> 
 matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>   (subscript) logical subscript too long
>> 
>> OK. But this is easy enough to handle.
>> 
>>> 
>>> H.
>>> 
>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
 I suspect this will have suboptimal performance since the TRUEs will
 get recycled. (Maybe there is, or could be, ALTREP, support for
 recycling)
 Hadley
>> 
>> 
>> AFAICS, it is not an issue. Taking
>> 
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> 
>> as a test case
>> 
>> and using a function that will either use the literal code 
>> `x[idrop=FALSE]' or `eval(mc)':
>> 
>> subset_ROW4 <-
>> function(x, i, useLiteral=FALSE)
>> {
>>literal <- quote(x[idrop=FALSE])
>>mc <- quote(x[i])
>>nd <- max(1L, length(dim(x)))
>>mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>mc[["drop"]] <- FALSE
>>if (useLiteral)
>>eval(literal)
>>else
>>eval(mc)
>> }
>> 
>> I get identical times with
>> 
>> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>> 
>> and with
>> 
>> system.time(for (i in 1:1) 
>> subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
> 
> I think that's because you used a relatively low precision timing
> mechnaism, and included the index generation in the timing. I see:
> 
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length = 10, by = 100)
> 
> bench::mark(
>  arr[i, TRUE, TRUE, TRUE],
>  arr[i, , , ]
> )
> #> # A tibble: 2 x 1
> #>   expressionminmean   median  max  n_gc
> #>
> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2
> 
> So not a huge difference, but it's there.


Funny. I get similar results to yours above albeit with smaller differences. 
Usually < 5 percent.

But with subset_ROW4 I see no consistent difference.

In this example, it runs faster on average using `eval(mc)' to return the 
result:

> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length=10,by=100)
> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
# A tibble: 2 x 8
  expression  min mean   median  max `itr/sec` 
mem_alloc  n_gc
 
 
1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms28686.
5.05KB 5
2 subset_ROW4(arr, i, TRUE)28.9µs 35µs   32.4µs 875.11µs28572.
5.05KB 5
>

And on subsequent reps the lead switches back and forth.


Chuck

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

A missing subscript is still preferable to a TRUE though because it
carries the meaning "take it all". A TRUE also achieves this but via
implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE]
achieve the same thing (if length(x) != 0) and are both no-ops but
the subsetting code gets a chance to immediately and easily detect
the former as a no-op whereas it will probably not be able to do it
so easily for the latter. So in this case it will most likely generate
a copy of 'x' and fill the new array by taking a full walk on it.

H.

On 06/08/2018 11:52 AM, Hadley Wickham wrote:

On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:




On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:

Also the TRUEs cause problems if some dimensions are 0:

  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
(subscript) logical subscript too long


OK. But this is easy enough to handle.



H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley



AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[idrop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
  function(x, i, useLiteral=FALSE)
{
 literal <- quote(x[idrop=FALSE])
 mc <- quote(x[i])
 nd <- max(1L, length(dim(x)))
 mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
 mc[["drop"]] <- FALSE
 if (useLiteral)
 eval(literal)
 else
 eval(mc)
  }

I get identical times with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))


I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
   arr[i, TRUE, TRUE, TRUE],
   arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.

Hadley




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
>>
>> Also the TRUEs cause problems if some dimensions are 0:
>>
>>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>(subscript) logical subscript too long
>
> OK. But this is easy enough to handle.
>
>>
>> H.
>>
>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>> I suspect this will have suboptimal performance since the TRUEs will
>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>> recycling)
>>> Hadley
>
>
> AFAICS, it is not an issue. Taking
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>
> as a test case
>
> and using a function that will either use the literal code 
> `x[idrop=FALSE]' or `eval(mc)':
>
> subset_ROW4 <-
>  function(x, i, useLiteral=FALSE)
> {
> literal <- quote(x[idrop=FALSE])
> mc <- quote(x[i])
> nd <- max(1L, length(dim(x)))
> mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
> mc[["drop"]] <- FALSE
> if (useLiteral)
> eval(literal)
> else
> eval(mc)
>  }
>
> I get identical times with
>
> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>
> and with
>
> system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expressionminmean   median  max  n_gc
#>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms 2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs 2

So not a huge difference, but it's there.

Hadley


-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Berry, Charles


> On Jun 8, 2018, at 10:37 AM, Hervé Pagès  wrote:
> 
> Also the TRUEs cause problems if some dimensions are 0:
> 
>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>(subscript) logical subscript too long

OK. But this is easy enough to handle. 

> 
> H.
> 
> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>> I suspect this will have suboptimal performance since the TRUEs will
>> get recycled. (Maybe there is, or could be, ALTREP, support for
>> recycling)
>> Hadley


AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case 

and using a function that will either use the literal code `x[idrop=FALSE]' 
or `eval(mc)':

subset_ROW4 <-
 function(x, i, useLiteral=FALSE)
{
literal <- quote(x[idrop=FALSE])
mc <- quote(x[i])
nd <- max(1L, length(dim(x)))
mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
mc[["drop"]] <- FALSE
if (useLiteral)
eval(literal)
else
eval(mc)
 }

I get identical times with

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with 

system.time(for (i in 1:1) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

Changing the dimensions to c(2^5, 2^7, 4, 4 ) and running something similar 
also shows equal times.

Chuck

>> On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles  wrote:
>>> 
>>> 
 On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:
 
 Hi all,
 
 Is there a better to way to subset the ROWs (in the sense of NROW) of
 an vector, matrix, data frame or array than this?
>>> 
>>> 
>>> You can use TRUE to fill the subscripts for dimensions 2:nd
>>> 
 
 subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
x[i]
  } else {
dims <- rep(list(quote(expr = )), nd - 1L)
do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
  }
 }
>>> 
>>> 
>>> subset_ROW <-
>>> function(x,i)
>>> {
>>> mc <- quote(x[i])
>>> nd <- max(1L, length(dim(x)))
>>> mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
>>> mc[["drop"]] <- FALSE
>>> eval(mc)
>>> 
>>> }
>>> 
 
 subset_ROW(1:10, 4:6)
 #> [1] 4 5 6
 
 str(subset_ROW(array(1:10, c(10)), 2:4))
 #>  int [1:3(1d)] 2 3 4
 str(subset_ROW(array(1:10, c(10, 1)), 2:4))
 #>  int [1:3, 1] 2 3 4
 str(subset_ROW(array(1:10, c(5, 2)), 2:4))
 #>  int [1:3, 1:2] 2 3 4 7 8 9
 str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
 #>  int [1:3, 1, 1] 2 3 4
 
 subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
 #>   x y
 #> 2 2 9
 #> 3 3 8
 #> 4 4 7
 
>>> 
>>> HTH,
>>> 
>>> Chuck
>>> 
> 
> -- 
> Hervé Pagès
> 
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
> 
> E-mail: hpa...@fredhutch.org
> Phone:  (206) 667-5791
> Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

Also the TRUEs cause problems if some dimensions are 0:

  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
(subscript) logical subscript too long

H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley

On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles  wrote:




On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?



You can use TRUE to fill the subscripts for dimensions 2:nd



subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
x[i]
  } else {
dims <- rep(list(quote(expr = )), nd - 1L)
do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
  }
}



subset_ROW <-
 function(x,i)
{
 mc <- quote(x[i])
 nd <- max(1L, length(dim(x)))
 mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
 mc[["drop"]] <- FALSE
 eval(mc)

}



subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7



HTH,

Chuck







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

On 06/08/2018 10:32 AM, Hervé Pagès wrote:

On 06/08/2018 10:15 AM, Michael Lawrence wrote:

There probably should be an abstraction for this. In S4Vectors, we
have extractROWS().


FWIW the code in S4Vectors that does what your subset_ROW() does is:


https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_S4Vectors_blob_04cc9516af986b30445e99fd1337f13321b7b4f6_R_subsetting-2Dutils.R-23L466-2DL476=DwIFaQ=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=LnDTzOeXwI6VI-4SVVi2rwDE7A-az-AhxPAB6X7Lkhc=_2PVGd2BrNNHtPjGsJkhSLAmtX3eoFuZDWWs2c8zZ4w= 


Wrong link sorry. Here is the correct one:


https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L453-L464

H.




(This is the default "extractROWS" method.)

Except for the normalization of 'i', it does the same as your
subset_ROW(). I don't know how to do this without generating a call
with missing arguments.

H.



Michael

On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham  
wrote:

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
   nd <- length(dim(x))
   if (nd <= 1L) {
 x[i]
   } else {
 dims <- rep(list(quote(expr = )), nd - 1L)
 do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
   }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

--
https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q= 



__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518= 





__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518= 







--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hervé Pagès

On 06/08/2018 10:15 AM, Michael Lawrence wrote:

There probably should be an abstraction for this. In S4Vectors, we
have extractROWS().


FWIW the code in S4Vectors that does what your subset_ROW() does is:


https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L466-L476

(This is the default "extractROWS" method.)

Except for the normalization of 'i', it does the same as your
subset_ROW(). I don't know how to do this without generating a call
with missing arguments.

H.



Michael

On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham  wrote:

Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
   nd <- length(dim(x))
   if (nd <= 1L) {
 x[i]
   } else {
 dims <- rep(list(quote(expr = )), nd - 1L)
 do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
   }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

--
https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q=

__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518=



__
R-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518=



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley

On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles  wrote:
>
>
>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:
>>
>> Hi all,
>>
>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>> an vector, matrix, data frame or array than this?
>
>
> You can use TRUE to fill the subscripts for dimensions 2:nd
>
>>
>> subset_ROW <- function(x, i) {
>>  nd <- length(dim(x))
>>  if (nd <= 1L) {
>>x[i]
>>  } else {
>>dims <- rep(list(quote(expr = )), nd - 1L)
>>do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>  }
>> }
>
>
> subset_ROW <-
> function(x,i)
> {
> mc <- quote(x[i])
> nd <- max(1L, length(dim(x)))
> mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
> mc[["drop"]] <- FALSE
> eval(mc)
>
> }
>
>>
>> subset_ROW(1:10, 4:6)
>> #> [1] 4 5 6
>>
>> str(subset_ROW(array(1:10, c(10)), 2:4))
>> #>  int [1:3(1d)] 2 3 4
>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>> #>  int [1:3, 1] 2 3 4
>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>> #>  int [1:3, 1, 1] 2 3 4
>>
>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>> #>   x y
>> #> 2 2 9
>> #> 3 3 8
>> #> 4 4 7
>>
>
> HTH,
>
> Chuck
>



-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Berry, Charles



> On Jun 8, 2018, at 8:45 AM, Hadley Wickham  wrote:
> 
> Hi all,
> 
> Is there a better to way to subset the ROWs (in the sense of NROW) of
> an vector, matrix, data frame or array than this?


You can use TRUE to fill the subscripts for dimensions 2:nd

> 
> subset_ROW <- function(x, i) {
>  nd <- length(dim(x))
>  if (nd <= 1L) {
>x[i]
>  } else {
>dims <- rep(list(quote(expr = )), nd - 1L)
>do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>  }
> }


subset_ROW <-
function(x,i)
{
mc <- quote(x[i])
nd <- max(1L, length(dim(x)))
mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
mc[["drop"]] <- FALSE
eval(mc)

}

> 
> subset_ROW(1:10, 4:6)
> #> [1] 4 5 6
> 
> str(subset_ROW(array(1:10, c(10)), 2:4))
> #>  int [1:3(1d)] 2 3 4
> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> #>  int [1:3, 1] 2 3 4
> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> #>  int [1:3, 1:2] 2 3 4 7 8 9
> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> #>  int [1:3, 1, 1] 2 3 4
> 
> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> #>   x y
> #> 2 2 9
> #> 3 3 8
> #> 4 4 7
> 

HTH,

Chuck

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Michael Lawrence
There probably should be an abstraction for this. In S4Vectors, we
have extractROWS().

Michael

On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham  wrote:
> Hi all,
>
> Is there a better to way to subset the ROWs (in the sense of NROW) of
> an vector, matrix, data frame or array than this?
>
> subset_ROW <- function(x, i) {
>   nd <- length(dim(x))
>   if (nd <= 1L) {
> x[i]
>   } else {
> dims <- rep(list(quote(expr = )), nd - 1L)
> do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>   }
> }
>
> subset_ROW(1:10, 4:6)
> #> [1] 4 5 6
>
> str(subset_ROW(array(1:10, c(10)), 2:4))
> #>  int [1:3(1d)] 2 3 4
> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> #>  int [1:3, 1] 2 3 4
> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> #>  int [1:3, 1:2] 2 3 4 7 8 9
> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> #>  int [1:3, 1, 1] 2 3 4
>
> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> #>   x y
> #> 2 2 9
> #> 3 3 8
> #> 4 4 7
>
> It seems like there should be a way to do this that doesn't require
> generating a call with missing arguments, but I can't think of it.
>
> Thanks!
>
> Hadley
>
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Iñaki Úcar
Sorry, without remnants from other attempts:

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L)
return(x[i])
  apply(x, 2:nd, `[`, i, drop=FALSE)
}
El vie., 8 jun. 2018 a las 19:07, Iñaki Úcar () escribió:
>
> El vie., 8 jun. 2018 a las 17:46, Hadley Wickham
> () escribió:
> >
> > Hi all,
> >
> > Is there a better to way to subset the ROWs (in the sense of NROW) of
> > an vector, matrix, data frame or array than this?
> >
> > subset_ROW <- function(x, i) {
> >   nd <- length(dim(x))
> >   if (nd <= 1L) {
> > x[i]
> >   } else {
> > dims <- rep(list(quote(expr = )), nd - 1L)
> > do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
> >   }
> > }
> >
> > subset_ROW(1:10, 4:6)
> > #> [1] 4 5 6
> >
> > str(subset_ROW(array(1:10, c(10)), 2:4))
> > #>  int [1:3(1d)] 2 3 4
> > str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> > #>  int [1:3, 1] 2 3 4
> > str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> > #>  int [1:3, 1:2] 2 3 4 7 8 9
> > str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> > #>  int [1:3, 1, 1] 2 3 4
> >
> > subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> > #>   x y
> > #> 2 2 9
> > #> 3 3 8
> > #> 4 4 7
> >
> > It seems like there should be a way to do this that doesn't require
> > generating a call with missing arguments, but I can't think of it.
>
> The following code seems to work. The only minor drawback is that, for
> the last case, the output is not a data frame.
>
> subset_ROW <- function(x, i) {
>   nd <- length(dim(x))
>   if (nd <= 1L)
> return(x[i])
>   xx <- apply(x, 2:nd, `[`, i, drop=FALSE)
>   dim(xx) <- c(length(i), dim(x)[-1])
>   xx
> }
>
> Iñaki
>
> >
> > Thanks!
> >
> > Hadley
> >
> > --
> > http://hadley.nz
> >



-- 
Iñaki Úcar
http://www.enchufa2.es
@Enchufa2

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Iñaki Úcar
El vie., 8 jun. 2018 a las 17:46, Hadley Wickham
() escribió:
>
> Hi all,
>
> Is there a better to way to subset the ROWs (in the sense of NROW) of
> an vector, matrix, data frame or array than this?
>
> subset_ROW <- function(x, i) {
>   nd <- length(dim(x))
>   if (nd <= 1L) {
> x[i]
>   } else {
> dims <- rep(list(quote(expr = )), nd - 1L)
> do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>   }
> }
>
> subset_ROW(1:10, 4:6)
> #> [1] 4 5 6
>
> str(subset_ROW(array(1:10, c(10)), 2:4))
> #>  int [1:3(1d)] 2 3 4
> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> #>  int [1:3, 1] 2 3 4
> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> #>  int [1:3, 1:2] 2 3 4 7 8 9
> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> #>  int [1:3, 1, 1] 2 3 4
>
> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> #>   x y
> #> 2 2 9
> #> 3 3 8
> #> 4 4 7
>
> It seems like there should be a way to do this that doesn't require
> generating a call with missing arguments, but I can't think of it.

The following code seems to work. The only minor drawback is that, for
the last case, the output is not a data frame.

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L)
return(x[i])
  xx <- apply(x, 2:nd, `[`, i, drop=FALSE)
  dim(xx) <- c(length(i), dim(x)[-1])
  xx
}

Iñaki

>
> Thanks!
>
> Hadley
>
> --
> http://hadley.nz
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Subsetting the "ROW"s of an object

2018-06-08 Thread Hadley Wickham
Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
x[i]
  } else {
dims <- rep(list(quote(expr = )), nd - 1L)
do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
  }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] pkg built with static vignette introduces dependency on R > = 3.5.0

2018-06-08 Thread Duncan Murdoch

On 08/06/2018 5:12 AM, Roman Flury wrote:

Dear all,

I'm working on a package which contains a static vignette. If the pkg is built 
with R version 3.3.3 everything works fine, but if built with the current 
R-devel version I get the warning:


 NB: this package now depends on R (>= 3.5.0)
 WARNING: Added dependency on R >= 3.5.0 because serialized objects in  
serialize/load version 3 cannot be read in older versions of R.  File(s) 
containing such objects:  'staticvignettepkg/build/vignette.rds'


and as described the dependency on R >= 3.5.0 is added to the DESCRIPTION file.

I found possible context for this behaviour in the R-devel NEWS 
https://cran.r-project.org/doc/manuals/r-devel/NEWS.html:

''R has new serialization format (version 3) which supports custom 
serialization of ALTREP framework objects. These objects can still be 
serialized in format 2, but less efficiently. Serialization format 3 also 
records the current native encoding of unflagged strings and converts them when 
de-serialized in R running under different native encoding. Format 3 comes with 
new serialization magic numbers (RDA3, RDB3, RDX3). Format 3 can be selected by 
version = 3 in save(), serialize() and saveRDS(), but format 2 remains the 
default for all serialization and saving of the workspace. Serialized data in 
format 3 cannot be read by versions of R prior to version 3.5.0.''

but I can not see why or how this should have an influence on a static vignette?

To illustrate and reproduce my issue I created a git repository 
https://github.com/romanflury/staticvignette with a minimal package, containing 
an arbitrary pdf document as a static vignette. The git repository includes the 
respective session infos also.

I hope to avoid this dependency, since I do not want to force users to update 
their R version.


When R builds a package with vignettes, it adds a file 
build/vignette.rds to the tarball that contains information about the 
vignettes.  Since R-devel is switching the format of .rds files, this 
file is in the new format, which can't be read by R versions prior to 3.5.0.


Generally speaking there is no guarantee that R x.y.z can handle a 
package built in a later version, and this is an example of that 
problem:  R x.y.z can't handle a package built in x.(y+2).z.


So the solution is to build your tarball in R 3.5.x or earlier, not in 
R-devel, or to add the dependency mentioned in the warning message.


Duncan Murdoch

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] pkg built with static vignette introduces dependency on R > = 3.5.0

2018-06-08 Thread Roman Flury
Dear all,

I'm working on a package which contains a static vignette. If the pkg is built 
with R version 3.3.3 everything works fine, but if built with the current 
R-devel version I get the warning:

> NB: this package now depends on R (>= 3.5.0)
> WARNING: Added dependency on R >= 3.5.0 because serialized objects in  
> serialize/load version 3 cannot be read in older versions of R.  File(s) 
> containing such objects:  'staticvignettepkg/build/vignette.rds'

and as described the dependency on R >= 3.5.0 is added to the DESCRIPTION file.

I found possible context for this behaviour in the R-devel NEWS 
https://cran.r-project.org/doc/manuals/r-devel/NEWS.html:

''R has new serialization format (version 3) which supports custom 
serialization of ALTREP framework objects. These objects can still be 
serialized in format 2, but less efficiently. Serialization format 3 also 
records the current native encoding of unflagged strings and converts them when 
de-serialized in R running under different native encoding. Format 3 comes with 
new serialization magic numbers (RDA3, RDB3, RDX3). Format 3 can be selected by 
version = 3 in save(), serialize() and saveRDS(), but format 2 remains the 
default for all serialization and saving of the workspace. Serialized data in 
format 3 cannot be read by versions of R prior to version 3.5.0.''

but I can not see why or how this should have an influence on a static vignette?

To illustrate and reproduce my issue I created a git repository 
https://github.com/romanflury/staticvignette with a minimal package, containing 
an arbitrary pdf document as a static vignette. The git repository includes the 
respective session infos also.

I hope to avoid this dependency, since I do not want to force users to update 
their R version.

Many thanks,

Roman

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Bioc-devel] "Reviving" an existing workflow

2018-06-08 Thread Bernd Klaus
Hello Lori,

thanks for your support! The workflow package is now on the tracker:

https://github.com/Bioconductor/Contributions/issues/765

Cheers,

Bernd

On Di, 2018-06-05 at 15:33 +, Shepherd, Lori wrote:
> Excellent news!  The RAM requirement was a legacy from the old
> builders and we have since improved the capabilities so you should be
> alright with the package as is.  I will look at updating the
> documentation on the website and thank you for bringing it to our
> attention. 
> 
> We look forward to your submission on the tracker. 
> 
> Cheers, 
> 
> Lori Shepherd
> Bioconductor Core Team
> Roswell Park Cancer Institute
> Department of Biostatistics & Bioinformatics
> Elm & Carlton Streets
> Buffalo, New York 14263
> From: Bernd Klaus 
> Sent: Tuesday, June 5, 2018 2:53:41 AM
> To: Shepherd, Lori; bioc-devel
> Cc: Stefanie Reisenauer
> Subject: Re: [Bioc-devel] "Reviving" an existing workflow
>  
> Hello Lori,
> 
> following up on this, we have now finished the 
> revision of the workflow and are ready to submit it to
> Bioc:
> 
> https://github.com/Steffireise/maEndToEnd
> 
> However, using a call to gc() at end of the 
> workflow and rendering it via rmarkdown::render 
> gives a memory usage of ~ 4.5 GB,
> while the guidelines:
> 
> http://bioconductor.org/developers/how-to/workflows/
> 
> say that "The package should require <= 4GB RAM"
> 
> Is it okay to submit the workflow as it is, or
> do we have to look into splitting it up as in:
> 
> http://bioconductor.org/packages/devel/workflows/html/simpleSingleCel
> l.
> html
> 
> ?
> 
> Thanks and best wishes,
> 
> Bernd
> 
> 
> On Do, 2018-05-03 at 18:41 +0200, Bernd Klaus wrote:
> > Hello Lori,
> > 
> > thanks a lot for your prompt and your constructive feedback! 
> > 
> > We will revise the workflow accordingly and then submit it 
> > to the submission tracker.
> > 
> > Cheers,
> > 
> > Bernd
> > 
> > On Do, 2018-05-03 at 15:50 +, Shepherd, Lori wrote:
> > > 
> > > Hello Bernd, 
> > > 
> > > Thanks for reaching out.  It looks like the last build on
> Jenkin's
> > > was two release cycles ago (Oct 2), because of the time lapse and
> > > that there have been quite a few changes to depending packages,
> and
> > > that it was never "published",  we would recommend putting the
> > > workflow on the submission tracker.  
> > > 
> > > We understand that it isn't a new submission, but this will allow
> > > the
> > > debugging process to be separate from the daily builder. 
> > > 
> > > https://github.com/Bioconductor/Contributions
> > > 
> > > 
> > > Thank you for converting your workflow to a package. Please also
> > > review   
> > > http://bioconductor.org/developers/how-to/workflows/  as it
> > > outlines
> > > some important aspects to implement that are new from when the
> > > workflow was originally submitted. 
> > >  Most Importantly:
> > > Add "Workflow: True"  as a field in the DESCRIPTION 
> > > Updating the BiocViews terms to be part of the workflow
> > > biocViews: ht
> > > tp://bioconductor.org/packages/devel/BiocViews.html#___Workflow
> > > See the consistent formatting section.  We will require the
> > > vignette
> > > be updated to use BiocStyle and include author and affiliations,
> > > date, and versioning information. 
> > > 
> > > 
> > > I did try to build your package locally and ran into quite a few
> > > issues.  
> > > The first run: 
> > > Quitting from lines 1510-1512 (MA-Workflow.Rmd) 
> > > Error: processing vignette 'MA-Workflow.Rmd' failed with
> > > diagnostics:
> > > package Rgraphviz is required
> > > 
> > > So it seems like you need to add Rgraphviz
> > > 
> > > The second run after installing Rgraphviz:
> > > Quitting from lines 1578-1579 (MA-Workflow.Rmd) 
> > > Error: processing vignette 'MA-Workflow.Rmd' failed with
> > > diagnostics:
> > > could not find function "enrichMap"
> > > 
> > > I think this stems from the fact that DOSE moved a lot of their
> > > plotting functions or the plotting functionality to
> the enrichplot
> > > package but something that will have to be remedied.
> > > 
> > > I also would git rm the vignette/MA-Workflow.html  - this should
> be
> > > generated automatically from the Rmd file and including a version
> > > could result it a stale copy. 
> > > 
> > > Just on quick glance of the vignette - It seems like you set.seed
> > > in
> > > the vignette but that code is not exposed to the user - assuming
> > > this
> > > would be important to show so that a user could reproduce your
> > > work.  
> > > 
> > > We look forward to getting this workflow active again and on the
> > > builder. 
> > > 
> > > Cheers, 
> > > 
> > > Lori Shepherd
> > > Bioconductor Core Team
> > > Roswell Park Cancer Institute
> > > Department of Biostatistics & Bioinformatics
> > > Elm & Carlton Streets
> > > Buffalo, New York 14263
> > > From: Bioc-devel  on behalf of
> > > Bernd Klaus 
> > > Sent: Thursday, May 3, 2018 3:03:20 AM
> > > To: bioc-devel
> > > Subject: [Bioc-devel] "Reviving" an existing