# Re: [R] Possible Improvement to sapply

```You’re right, it sure does. My suggestion causes it to fail when simplify =
‘array’```
```

From: William Dunlap [mailto:wdun...@tibco.com]
Sent: Tuesday, March 13, 2018 12:11 PM
To: Doran, Harold <hdo...@air.org>
Cc: r-help@r-project.org
Subject: Re: [R] Possible Improvement to sapply

Wouldn't that change how simplify='array' is handled?

>  str(sapply(1:3, function(x)diag(x,5,2), simplify="array"))
int [1:5, 1:2, 1:3] 1 0 0 0 0 0 1 0 0 0 ...
>  str(sapply(1:3, function(x)diag(x,5,2), simplify=TRUE))
int [1:10, 1:3] 1 0 0 0 0 0 1 0 0 0 ...
>  str(sapply(1:3, function(x)diag(x,5,2), simplify=FALSE))
List of 3
\$ : int [1:5, 1:2] 1 0 0 0 0 0 1 0 0 0
\$ : int [1:5, 1:2] 2 0 0 0 0 0 2 0 0 0
\$ : int [1:5, 1:2] 3 0 0 0 0 0 3 0 0 0

Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Tue, Mar 13, 2018 at 6:23 AM, Doran, Harold
<hdo...@air.org<mailto:hdo...@air.org>> wrote:
While working with sapply, the documentation states that the simplify argument
will yield a vector, matrix etc "when possible". I was curious how the code
actually defined "as possible" and see this within the function

This seems superfluous to me, in particular this part:

!identical(simplify, FALSE)

The preceding code could be reduced to

and it would not need to execute the call to identical in order to trigger the
conditional execution, which is known from the user's simplify = TRUE or FALSE
inputs. I *think* the extra call to identical is just unnecessary overhead in
this instance.

Take for example, the following toy example code and benchmark results and a
small modification to sapply:

myList <- list(a = rnorm(100), b = rnorm(100))

answer <- lapply(X = myList, FUN = length)
simplify = TRUE

library(microbenchmark)

mySapply <- function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE){
FUN <- match.fun(FUN)
answer <- lapply(X = X, FUN = FUN, ...)
if (USE.NAMES && is.character(X) && is.null(names(answer)))
simplify2array(answer, higher = (simplify == "array"))
}

> microbenchmark(sapply(myList, length), times = 10000L)
Unit: microseconds
expr    min     lq     mean median     uq    max neval
sapply(myList, length) 14.156 15.572 16.67603 15.926 16.634 650.46 10000
> microbenchmark(mySapply(myList, length), times = 10000L)
Unit: microseconds
expr    min     lq     mean median     uq      max neval
mySapply(myList, length) 13.095 14.864 16.02964 15.218 15.573 1671.804 10000

My benchmark timings show a timing improvement with only that small change made
and it is seemingly nominal. In my actual work, the sapply function is called

I have done some limited testing on various real data to verify that the
objects produced under both variants of the sapply (base R and my modified)
yield identical objects when simply is both TRUE or FALSE.

Perhaps someone else sees a counterexample where my proposed fix does not cause
for sapply to behave as expected.

Harold

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help