Re: [Rd] Suggestion: 20% speed up of which() with two-character mod
HenrikB == Henrik Bengtsson [EMAIL PROTECTED] on Mon, 4 Aug 2008 21:14:12 -0700 writes: HenrikB Hi, HenrikB I just want to do a follow up this very simple HenrikB fix/correction/speedup/cleanup of the base::which() function. Here is HenrikB a diff: HenrikB diff src/library/base/R/which.R which.R HenrikB 21c21 HenrikB wh - seq_along(x)[ll - x !is.na(x)] HenrikB --- wh - seq_along(x)[x !is.na(x)] HenrikB 25c25 HenrikB names(wh) - names(x)[ll] HenrikB --- names(wh) - names(x)[wh] HenrikB FYI, the 'll' variable is not used elsewhere. I've been going through HenrikB this modifications several times and I cannot see any side effects. HenrikB Could someone of R core please commit this? I had added your proposition to my version of R-devel in order to commit it, and had wanted to do my own performance tests under different scenarios, but I had forgotten / postponed it. {I have more such things , notably the help.request() from Kate Mullen -- with quite a few of my own changes, not quite finished ... that will have to wait for after useR!2008 ..} In fact, it seems is pretty obvious that the version with [wh] instead of [ll] should be faster in most cases, and never slower, and so I do commit it now. Thank you Henrik, for the reminder. Martin HenrikB BTW, when one report diff:s, do you prefer to get it with or without HenrikB context information, e.g. -C 3? {My exact preference would depend on the size / style of the patch itself. It does not really matter, and as a general rule, I'd personally prefer '-u' (unified diffs which include context)} HenrikB /Henrik HenrikB On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL PROTECTED] wrote: On Thu, 10 Jul 2008, Henrik Bengtsson wrote: Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck CURRENT CODE: which - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[ll - x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 - base::which; environment(which1) - globalenv(); # Needed? N - 1e6; set.seed(0xbeef); x - sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) - seq_along(x); B - 10; t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); print(t1/t2); ## usersystem elapsed ## 1.283186 1.052632 1.25 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 HenrikB __ HenrikB R-devel@r-project.org mailing list HenrikB https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: 20% speed up of which() with two-character mod
Hi, thanks for this. I'll use unified diff next time, i.e. diff -u current.R new.R /Henrik On Tue, Aug 5, 2008 at 5:54 AM, Martin Maechler [EMAIL PROTECTED] wrote: HenrikB == Henrik Bengtsson [EMAIL PROTECTED] on Mon, 4 Aug 2008 21:14:12 -0700 writes: HenrikB Hi, HenrikB I just want to do a follow up this very simple HenrikB fix/correction/speedup/cleanup of the base::which() function. Here is HenrikB a diff: HenrikB diff src/library/base/R/which.R which.R HenrikB 21c21 HenrikB wh - seq_along(x)[ll - x !is.na(x)] HenrikB --- wh - seq_along(x)[x !is.na(x)] HenrikB 25c25 HenrikB names(wh) - names(x)[ll] HenrikB --- names(wh) - names(x)[wh] HenrikB FYI, the 'll' variable is not used elsewhere. I've been going through HenrikB this modifications several times and I cannot see any side effects. HenrikB Could someone of R core please commit this? I had added your proposition to my version of R-devel in order to commit it, and had wanted to do my own performance tests under different scenarios, but I had forgotten / postponed it. {I have more such things , notably the help.request() from Kate Mullen -- with quite a few of my own changes, not quite finished ... that will have to wait for after useR!2008 ..} In fact, it seems is pretty obvious that the version with [wh] instead of [ll] should be faster in most cases, and never slower, and so I do commit it now. Thank you Henrik, for the reminder. Martin HenrikB BTW, when one report diff:s, do you prefer to get it with or without HenrikB context information, e.g. -C 3? {My exact preference would depend on the size / style of the patch itself. It does not really matter, and as a general rule, I'd personally prefer '-u' (unified diffs which include context)} HenrikB /Henrik HenrikB On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL PROTECTED] wrote: On Thu, 10 Jul 2008, Henrik Bengtsson wrote: Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck CURRENT CODE: which - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[ll - x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 - base::which; environment(which1) - globalenv(); # Needed? N - 1e6; set.seed(0xbeef); x - sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) - seq_along(x); B - 10; t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); print(t1/t2); ## usersystem elapsed ## 1.283186 1.052632 1.25 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 HenrikB __ HenrikB R-devel@r-project.org mailing list HenrikB https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: 20% speed up of which() with two-character mod
Hi, I just want to do a follow up this very simple fix/correction/speedup/cleanup of the base::which() function. Here is a diff: diff src/library/base/R/which.R which.R 21c21 wh - seq_along(x)[ll - x !is.na(x)] --- wh - seq_along(x)[x !is.na(x)] 25c25 names(wh) - names(x)[ll] --- names(wh) - names(x)[wh] FYI, the 'll' variable is not used elsewhere. I've been going through this modifications several times and I cannot see any side effects. Could someone of R core please commit this? BTW, when one report diff:s, do you prefer to get it with or without context information, e.g. -C 3? /Henrik On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL PROTECTED] wrote: On Thu, 10 Jul 2008, Henrik Bengtsson wrote: Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck CURRENT CODE: which - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[ll - x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 - base::which; environment(which1) - globalenv(); # Needed? N - 1e6; set.seed(0xbeef); x - sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) - seq_along(x); B - 10; t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); print(t1/t2); ## usersystem elapsed ## 1.283186 1.052632 1.25 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: 20% speed up of which() with two-character mod
On Thu, 10 Jul 2008, Henrik Bengtsson wrote: Hi, by replacing 'll' with 'wh' in the source code for base::which() one gets ~20% speed up for *named logical vectors*. The amount of speedup depends on how sparse the TRUE values are. When the proportion of TRUEs gets small the speedup is more than twofold on my macbook. For high proportions of TRUE, the speedup is more like the 20% you cite. HTH, Chuck CURRENT CODE: which - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[ll - x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[ll] } ... wh; } SUGGESTED CODE: (Remove 'll' and use 'wh') which2 - function(x, arr.ind = FALSE) { if(!is.logical(x)) stop(argument to 'which' is not logical) wh - seq_along(x)[x !is.na(x)] m - length(wh) dl - dim(x) if (is.null(dl) || !arr.ind) { names(wh) - names(x)[wh] } ... wh; } That's all. BENCHMARKING: # To measure both in same environment which1 - base::which; environment(which1) - globalenv(); # Needed? N - 1e6; set.seed(0xbeef); x - sample(c(TRUE, FALSE), size=N, replace=TRUE); names(x) - seq_along(x); B - 10; t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); stopifnot(identical(idxs1, idxs2)); print(t1/t2); # Fair benchmarking t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); }); t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); }); print(t1/t2); ## usersystem elapsed ## 1.283186 1.052632 1.25 You get similar results if you put for loop outside the system.time() call (and sum up the timings). Cheers Henrik __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel