>>>>> Doran, Harold <hdo...@air.org>
>>>>>     on Tue, 13 Mar 2018 16:14:19 +0000 writes:

    > You’re right, it sure does. My suggestion causes it to fail when simplify 
= ‘array’

    > From: William Dunlap [mailto:wdun...@tibco.com]
    > Sent: Tuesday, March 13, 2018 12:11 PM
    > To: Doran, Harold <hdo...@air.org>
    > Cc: r-help@r-project.org
    > Subject: Re: [R] Possible Improvement to sapply

    > Wouldn't that change how simplify='array' is handled?

    >> str(sapply(1:3, function(x)diag(x,5,2), simplify="array"))
    > int [1:5, 1:2, 1:3] 1 0 0 0 0 0 1 0 0 0 ...
    >> str(sapply(1:3, function(x)diag(x,5,2), simplify=TRUE))
    > int [1:10, 1:3] 1 0 0 0 0 0 1 0 0 0 ...
    >> str(sapply(1:3, function(x)diag(x,5,2), simplify=FALSE))
    > List of 3
    > $ : int [1:5, 1:2] 1 0 0 0 0 0 1 0 0 0
    > $ : int [1:5, 1:2] 2 0 0 0 0 0 2 0 0 0
    > $ : int [1:5, 1:2] 3 0 0 0 0 0 3 0 0 0


    > Bill Dunlap
    > TIBCO Software
    > wdunlap tibco.com<http://tibco.com>

Yes, indeed, thank you Bill!

I sometimes marvel at how much the mental capacities of R core
are underestimated.  Of course, nobody is perfect, but the bugs
we produce are really more subtle than that ...  ;-)

Martin Maechler
R core  


    > On Tue, Mar 13, 2018 at 6:23 AM, Doran, Harold 
<hdo...@air.org<mailto:hdo...@air.org>> wrote:
    > While working with sapply, the documentation states that the simplify 
argument will yield a vector, matrix etc "when possible". I was curious how the 
code actually defined "as possible" and see this within the function

    > if (!identical(simplify, FALSE) && length(answer))

    > This seems superfluous to me, in particular this part:

    > !identical(simplify, FALSE)

    > The preceding code could be reduced to

    > if (simplify && length(answer))

    > and it would not need to execute the call to identical in order to 
trigger the conditional execution, which is known from the user's simplify = 
TRUE or FALSE inputs. I *think* the extra call to identical is just unnecessary 
overhead in this instance.

    > Take for example, the following toy example code and benchmark results 
and a small modification to sapply:

    > myList <- list(a = rnorm(100), b = rnorm(100))

    > answer <- lapply(X = myList, FUN = length)
    > simplify = TRUE

    > library(microbenchmark)

    > mySapply <- function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE){
    > FUN <- match.fun(FUN)
    > answer <- lapply(X = X, FUN = FUN, ...)
    > if (USE.NAMES && is.character(X) && is.null(names(answer)))
    > names(answer) <- X
    > if (simplify && length(answer))
    > simplify2array(answer, higher = (simplify == "array"))
    > else answer
    > }


    >> microbenchmark(sapply(myList, length), times = 10000L)
    > Unit: microseconds
    > expr    min     lq     mean median     uq    max neval
    > sapply(myList, length) 14.156 15.572 16.67603 15.926 16.634 650.46 10000
    >> microbenchmark(mySapply(myList, length), times = 10000L)
    > Unit: microseconds
    > expr    min     lq     mean median     uq      max neval
    > mySapply(myList, length) 13.095 14.864 16.02964 15.218 15.573 1671.804 
10000

    > My benchmark timings show a timing improvement with only that small 
change made and it is seemingly nominal. In my actual work, the sapply function 
is called millions of times and this additional overhead propagates to some 
overall additional computing time.

    > I have done some limited testing on various real data to verify that the 
objects produced under both variants of the sapply (base R and my modified) 
yield identical objects when simply is both TRUE or FALSE.

    > Perhaps someone else sees a counterexample where my proposed fix does not 
cause for sapply to behave as expected.

    > Harold

    > ______________________________________________
    > R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.


    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to