Wouldn't that change how simplify='array' is handled? > str(sapply(1:3, function(x)diag(x,5,2), simplify="array")) int [1:5, 1:2, 1:3] 1 0 0 0 0 0 1 0 0 0 ... > str(sapply(1:3, function(x)diag(x,5,2), simplify=TRUE)) int [1:10, 1:3] 1 0 0 0 0 0 1 0 0 0 ... > str(sapply(1:3, function(x)diag(x,5,2), simplify=FALSE)) List of 3 $ : int [1:5, 1:2] 1 0 0 0 0 0 1 0 0 0 $ : int [1:5, 1:2] 2 0 0 0 0 0 2 0 0 0 $ : int [1:5, 1:2] 3 0 0 0 0 0 3 0 0 0

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Mar 13, 2018 at 6:23 AM, Doran, Harold <hdo...@air.org> wrote:
> While working with sapply, the documentation states that the simplify
> argument will yield a vector, matrix etc "when possible". I was curious how
> the code actually defined "as possible" and see this within the function
>
> if (!identical(simplify, FALSE) && length(answer))
>
> This seems superfluous to me, in particular this part:
>
> !identical(simplify, FALSE)
>
> The preceding code could be reduced to
>
> if (simplify && length(answer))
>
> and it would not need to execute the call to identical in order to trigger
> the conditional execution, which is known from the user's simplify = TRUE
> or FALSE inputs. I *think* the extra call to identical is just unnecessary
> overhead in this instance.
>
> Take for example, the following toy example code and benchmark results and
> a small modification to sapply:
>
> myList <- list(a = rnorm(100), b = rnorm(100))
>
> answer <- lapply(X = myList, FUN = length)
> simplify = TRUE
>
> library(microbenchmark)
>
> mySapply <- function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE){
> FUN <- match.fun(FUN)
> answer <- lapply(X = X, FUN = FUN, ...)
> if (USE.NAMES && is.character(X) && is.null(names(answer)))
> names(answer) <- X
> if (simplify && length(answer))
> simplify2array(answer, higher = (simplify == "array"))
> else answer
> }
>
>
>
> microbenchmark(sapply(myList, length), times = 10000L)
> Unit: microseconds
> expr min lq mean median uq max neval
> sapply(myList, length) 14.156 15.572 16.67603 15.926 16.634 650.46 10000
>
> microbenchmark(mySapply(myList, length), times = 10000L)
> Unit: microseconds
> expr min lq mean median uq max
> neval
> mySapply(myList, length) 13.095 14.864 16.02964 15.218 15.573 1671.804
> 10000
>
> My benchmark timings show a timing improvement with only that small change
> made and it is seemingly nominal. In my actual work, the sapply function is
> called millions of times and this additional overhead propagates to some
> overall additional computing time.
>
> I have done some limited testing on various real data to verify that the
> objects produced under both variants of the sapply (base R and my modified)
> yield identical objects when simply is both TRUE or FALSE.
>
> Perhaps someone else sees a counterexample where my proposed fix does not
> cause for sapply to behave as expected.
>
> Harold