Re: [Rd] Suggestion: 20% speed up of which() with two-character mod

2008-08-05 Thread Martin Maechler
 HenrikB == Henrik Bengtsson [EMAIL PROTECTED]
 on Mon, 4 Aug 2008 21:14:12 -0700 writes:

HenrikB Hi,

HenrikB I just want to do a follow up this very simple
HenrikB fix/correction/speedup/cleanup of the base::which() function.  
Here is
HenrikB a diff:

HenrikB diff src/library/base/R/which.R which.R
HenrikB 21c21
HenrikB  wh - seq_along(x)[ll - x  !is.na(x)]
HenrikB ---
 wh - seq_along(x)[x  !is.na(x)]
HenrikB 25c25
HenrikB names(wh) - names(x)[ll]
HenrikB ---
   names(wh) - names(x)[wh]

HenrikB FYI, the 'll' variable is not used elsewhere.  I've been going 
through
HenrikB this modifications several times and I cannot see any side effects.

HenrikB Could someone of R core please commit this?

I had added your proposition to my version of R-devel in order
to commit it, and had wanted to do my own performance tests
under different scenarios, but I had forgotten / postponed it.
{I have more such things , notably the help.request() from Kate
 Mullen  -- with quite a few of my own changes, not quite
 finished ... that will have to wait for after useR!2008 ..}

In fact, it seems is pretty obvious that the version with [wh]
instead of [ll] should be faster in most cases, and never
slower,
and so I do commit it now.

Thank you Henrik, for the reminder.

Martin

HenrikB BTW, when one report diff:s, do you prefer to get it with or 
without
HenrikB context information, e.g. -C 3?

{My exact preference would depend on the size / style of the
 patch itself. It does not really matter, and as a general rule,
 I'd personally prefer '-u' (unified diffs which include context)}

HenrikB /Henrik

HenrikB On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL 
PROTECTED] wrote:
 On Thu, 10 Jul 2008, Henrik Bengtsson wrote:
 
 Hi,
 
 by replacing 'll' with 'wh' in the source code for base::which() one
 gets ~20% speed up for *named logical vectors*.
 
 
 The amount of speedup depends on how sparse the TRUE values are.
 
 When the proportion of TRUEs gets small the speedup is more than twofold 
on
 my macbook. For high proportions of TRUE, the speedup is more like the 
20%
 you cite.
 
 HTH,
 
 Chuck
 
 
 CURRENT CODE:
 
 which - function(x, arr.ind = FALSE)
 {
 if(!is.logical(x))
 stop(argument to 'which' is not logical)
 wh - seq_along(x)[ll - x  !is.na(x)]
 m - length(wh)
 dl - dim(x)
 if (is.null(dl) || !arr.ind) {
 names(wh) - names(x)[ll]
 }
 ...
 wh;
 }
 
 SUGGESTED CODE: (Remove 'll' and use 'wh')
 
 which2 - function(x, arr.ind = FALSE)
 {
 if(!is.logical(x))
 stop(argument to 'which' is not logical)
 wh - seq_along(x)[x  !is.na(x)]
 m - length(wh)
 dl - dim(x)
 if (is.null(dl) || !arr.ind) {
 names(wh) - names(x)[wh]
 }
 ...
 wh;
 }
 
 That's all.
 
 BENCHMARKING:
 
 # To measure both in same environment
 which1 - base::which;
 environment(which1) - globalenv();  # Needed?
 
 N - 1e6;
 set.seed(0xbeef);
 x - sample(c(TRUE, FALSE), size=N, replace=TRUE);
 names(x) - seq_along(x);
 B - 10;
 t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
 t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
 stopifnot(identical(idxs1, idxs2));
 print(t1/t2);
 # Fair benchmarking
 t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
 t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
 print(t1/t2);
 ##  usersystem   elapsed
 ##   1.283186   1.052632   1.25
 
 You get similar results if you put for loop outside the system.time()
 call (and sum up the timings).
 
 Cheers
 
 Henrik
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 
 Charles C. Berry(858) 534-2098
 Dept of Family/Preventive
 Medicine
 E mailto:[EMAIL PROTECTED]   UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
92093-0901
 
 
 

HenrikB __
HenrikB R-devel@r-project.org mailing list
HenrikB https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: 20% speed up of which() with two-character mod

2008-08-05 Thread Henrik Bengtsson
Hi,

thanks for this.  I'll use unified diff next time, i.e.

  diff -u current.R new.R

/Henrik

On Tue, Aug 5, 2008 at 5:54 AM, Martin Maechler
[EMAIL PROTECTED] wrote:
 HenrikB == Henrik Bengtsson [EMAIL PROTECTED]
 on Mon, 4 Aug 2008 21:14:12 -0700 writes:

HenrikB Hi,

HenrikB I just want to do a follow up this very simple
HenrikB fix/correction/speedup/cleanup of the base::which() function.  
 Here is
HenrikB a diff:

HenrikB diff src/library/base/R/which.R which.R
HenrikB 21c21
HenrikB  wh - seq_along(x)[ll - x  !is.na(x)]
HenrikB ---
 wh - seq_along(x)[x  !is.na(x)]
HenrikB 25c25
HenrikB names(wh) - names(x)[ll]
HenrikB ---
   names(wh) - names(x)[wh]

HenrikB FYI, the 'll' variable is not used elsewhere.  I've been going 
 through
HenrikB this modifications several times and I cannot see any side 
 effects.

HenrikB Could someone of R core please commit this?

 I had added your proposition to my version of R-devel in order
 to commit it, and had wanted to do my own performance tests
 under different scenarios, but I had forgotten / postponed it.
 {I have more such things , notably the help.request() from Kate
  Mullen  -- with quite a few of my own changes, not quite
  finished ... that will have to wait for after useR!2008 ..}

 In fact, it seems is pretty obvious that the version with [wh]
 instead of [ll] should be faster in most cases, and never
 slower,
 and so I do commit it now.

 Thank you Henrik, for the reminder.

 Martin

HenrikB BTW, when one report diff:s, do you prefer to get it with or 
 without
HenrikB context information, e.g. -C 3?

 {My exact preference would depend on the size / style of the
  patch itself. It does not really matter, and as a general rule,
  I'd personally prefer '-u' (unified diffs which include context)}

HenrikB /Henrik

HenrikB On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL 
 PROTECTED] wrote:
 On Thu, 10 Jul 2008, Henrik Bengtsson wrote:

 Hi,

 by replacing 'll' with 'wh' in the source code for base::which() one
 gets ~20% speed up for *named logical vectors*.


 The amount of speedup depends on how sparse the TRUE values are.

 When the proportion of TRUEs gets small the speedup is more than 
 twofold on
 my macbook. For high proportions of TRUE, the speedup is more like the 
 20%
 you cite.

 HTH,

 Chuck


 CURRENT CODE:

 which - function(x, arr.ind = FALSE)
 {
 if(!is.logical(x))
 stop(argument to 'which' is not logical)
 wh - seq_along(x)[ll - x  !is.na(x)]
 m - length(wh)
 dl - dim(x)
 if (is.null(dl) || !arr.ind) {
 names(wh) - names(x)[ll]
 }
 ...
 wh;
 }

 SUGGESTED CODE: (Remove 'll' and use 'wh')

 which2 - function(x, arr.ind = FALSE)
 {
 if(!is.logical(x))
 stop(argument to 'which' is not logical)
 wh - seq_along(x)[x  !is.na(x)]
 m - length(wh)
 dl - dim(x)
 if (is.null(dl) || !arr.ind) {
 names(wh) - names(x)[wh]
 }
 ...
 wh;
 }

 That's all.

 BENCHMARKING:

 # To measure both in same environment
 which1 - base::which;
 environment(which1) - globalenv();  # Needed?

 N - 1e6;
 set.seed(0xbeef);
 x - sample(c(TRUE, FALSE), size=N, replace=TRUE);
 names(x) - seq_along(x);
 B - 10;
 t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
 t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
 stopifnot(identical(idxs1, idxs2));
 print(t1/t2);
 # Fair benchmarking
 t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
 t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
 print(t1/t2);
 ##  usersystem   elapsed
 ##   1.283186   1.052632   1.25

 You get similar results if you put for loop outside the system.time()
 call (and sum up the timings).

 Cheers

 Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 Charles C. Berry(858) 534-2098
 Dept of Family/Preventive
 Medicine
 E mailto:[EMAIL PROTECTED]   UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 
 92093-0901




HenrikB __
HenrikB R-devel@r-project.org mailing list
HenrikB https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: 20% speed up of which() with two-character mod

2008-08-04 Thread Henrik Bengtsson
Hi,

I just want to do a follow up this very simple
fix/correction/speedup/cleanup of the base::which() function.  Here is
a diff:

diff src/library/base/R/which.R which.R
21c21
 wh - seq_along(x)[ll - x  !is.na(x)]
---
 wh - seq_along(x)[x  !is.na(x)]
25c25
 names(wh) - names(x)[ll]
---
 names(wh) - names(x)[wh]

FYI, the 'll' variable is not used elsewhere.  I've been going through
this modifications several times and I cannot see any side effects.

Could someone of R core please commit this?

BTW, when one report diff:s, do you prefer to get it with or without
context information, e.g. -C 3?

/Henrik

On Fri, Jul 11, 2008 at 8:57 AM, Charles C. Berry [EMAIL PROTECTED] wrote:
 On Thu, 10 Jul 2008, Henrik Bengtsson wrote:

 Hi,

 by replacing 'll' with 'wh' in the source code for base::which() one
 gets ~20% speed up for *named logical vectors*.


 The amount of speedup depends on how sparse the TRUE values are.

 When the proportion of TRUEs gets small the speedup is more than twofold on
 my macbook. For high proportions of TRUE, the speedup is more like the 20%
 you cite.

 HTH,

 Chuck


 CURRENT CODE:

 which - function(x, arr.ind = FALSE)
 {
   if(!is.logical(x))
stop(argument to 'which' is not logical)
   wh - seq_along(x)[ll - x  !is.na(x)]
   m - length(wh)
   dl - dim(x)
   if (is.null(dl) || !arr.ind) {
   names(wh) - names(x)[ll]
   }
   ...
   wh;
 }

 SUGGESTED CODE: (Remove 'll' and use 'wh')

 which2 - function(x, arr.ind = FALSE)
 {
   if(!is.logical(x))
stop(argument to 'which' is not logical)
   wh - seq_along(x)[x  !is.na(x)]
   m - length(wh)
   dl - dim(x)
   if (is.null(dl) || !arr.ind) {
   names(wh) - names(x)[wh]
   }
   ...
   wh;
 }

 That's all.

 BENCHMARKING:

 # To measure both in same environment
 which1 - base::which;
 environment(which1) - globalenv();  # Needed?

 N - 1e6;
 set.seed(0xbeef);
 x - sample(c(TRUE, FALSE), size=N, replace=TRUE);
 names(x) - seq_along(x);
 B - 10;
 t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
 t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
 stopifnot(identical(idxs1, idxs2));
 print(t1/t2);
 # Fair benchmarking
 t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
 t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
 print(t1/t2);
 ##  usersystem   elapsed
 ##   1.283186   1.052632   1.25

 You get similar results if you put for loop outside the system.time()
 call (and sum up the timings).

 Cheers

 Henrik

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 Charles C. Berry(858) 534-2098
Dept of Family/Preventive
 Medicine
 E mailto:[EMAIL PROTECTED]   UC San Diego
 http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: 20% speed up of which() with two-character mod

2008-07-11 Thread Charles C. Berry

On Thu, 10 Jul 2008, Henrik Bengtsson wrote:


Hi,

by replacing 'll' with 'wh' in the source code for base::which() one
gets ~20% speed up for *named logical vectors*.



The amount of speedup depends on how sparse the TRUE values are.

When the proportion of TRUEs gets small the speedup is more than twofold 
on my macbook. For high proportions of TRUE, the speedup is more like the 
20% you cite.


HTH,

Chuck



CURRENT CODE:

which - function(x, arr.ind = FALSE)
{
   if(!is.logical(x))
stop(argument to 'which' is not logical)
   wh - seq_along(x)[ll - x  !is.na(x)]
   m - length(wh)
   dl - dim(x)
   if (is.null(dl) || !arr.ind) {
   names(wh) - names(x)[ll]
   }
   ...
   wh;
}

SUGGESTED CODE: (Remove 'll' and use 'wh')

which2 - function(x, arr.ind = FALSE)
{
   if(!is.logical(x))
stop(argument to 'which' is not logical)
   wh - seq_along(x)[x  !is.na(x)]
   m - length(wh)
   dl - dim(x)
   if (is.null(dl) || !arr.ind) {
   names(wh) - names(x)[wh]
   }
   ...
   wh;
}

That's all.

BENCHMARKING:

# To measure both in same environment
which1 - base::which;
environment(which1) - globalenv();  # Needed?

N - 1e6;
set.seed(0xbeef);
x - sample(c(TRUE, FALSE), size=N, replace=TRUE);
names(x) - seq_along(x);
B - 10;
t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
stopifnot(identical(idxs1, idxs2));
print(t1/t2);
# Fair benchmarking
t2 - system.time({ for (bb in 1:B) idxs2 - which2(x); });
t1 - system.time({ for (bb in 1:B) idxs1 - which1(x); });
print(t1/t2);
##  usersystem   elapsed
##   1.283186   1.052632   1.25

You get similar results if you put for loop outside the system.time()
call (and sum up the timings).

Cheers

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel