Re: [Rd] faster base::sequence

2010-12-16 Thread Hervé Pagès

Hi Romain,

FWIW I see at least 2 small differences in the way sequence_c()
behaves with respect to good old sequence(): zeros and names.

 sequence(c(a=5, b=0, c=2))
a1 a2 a3 a4 a5 c1 c2
 1  2  3  4  5  1  2

sequence_c() ignores the names and doesn't support zeros in the input.

Cheers,
H.


On 11/28/2010 01:56 AM, Romain Francois wrote:

Le 28/11/10 10:30, Prof Brian Ripley a écrit :

Is sequence used enough to warrant this? As the help page says

Note that ‘sequence - function(nvec) unlist(lapply(nvec,
seq_len))’ and it mainly exists in reverence to the very early
history of R.


I don't know. Would it be used more if it were more efficient ?


I regard it as unsafe to assume that NA_INTEGER will always be negative,
and bear in mind that at some point not so far off R integers (or at
least lengths) will need to be more than 32-bit.


sure. updated and dressed up as a patch.

I've made it a .Call because I'm not really comfortable with .Internal,
etc ...

Do you mean that I should also use something else instead of int and
int*. Is there some future proof typedef or macro for the type
associated with INTSXP ?



On Sun, 28 Nov 2010, Romain Francois wrote:


Hello,

Based on yesterday's R-help thread (help: program efficiency), and
following Bill's suggestions, it appeared that sequence:


sequence

function (nvec)
unlist(lapply(nvec, seq_len))
environment: namespace:base

could benefit from being written in C to avoid unnecessary memory
allocations.

I made this version using inline:

require( inline )
sequence_c - local( {
fx - cfunction( signature( x = integer), '
int n = length(x) ;
int* px = INTEGER(x) ;
int x_i, s = 0 ;
/* error checking */
for( int i=0; in; i++){
x_i = px[i] ;
/* this includes the check for NA */
if( x_i = 0 ) error( needs non negative integer ) ;
s += x_i ;
}

SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
int * p_res = INTEGER(res) ;
for( int i=0; in; i++){
x_i = px[i] ;
for( int j=0; jx_i; j++, p_res++)
*p_res = j+1 ;
}
UNPROTECT(1) ;
return res ;
' )
function( nvec ){
fx( as.integer(nvec) )
}
})


And here are some timings:


x - 1:1
system.time( a - sequence(x ) )

utilisateur système écoulé
0.191 0.108 0.298

system.time( b - sequence_c(x ) )

utilisateur système écoulé
0.060 0.063 0.122

identical( a, b )

[1] TRUE




system.time( for( i in 1:1) sequence(1:10) )

utilisateur système écoulé
0.119 0.000 0.119


system.time( for( i in 1:1) sequence_c(1:10) )

utilisateur système écoulé
0.019 0.000 0.019


I would write a proper patch if someone from R-core is willing to push
it.

Romain




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] faster base::sequence

2010-11-28 Thread Romain Francois

Hello,

Based on yesterday's R-help thread (help: program efficiency), and 
following Bill's suggestions, it appeared that sequence:


 sequence
function (nvec)
unlist(lapply(nvec, seq_len))
environment: namespace:base

could benefit from being written in C to avoid unnecessary memory 
allocations.


I made this version using inline:

require( inline )
sequence_c - local( {
fx - cfunction( signature( x = integer), '
int n = length(x) ;
int* px = INTEGER(x) ;
int x_i, s = 0 ;
/* error checking */
for( int i=0; in; i++){
x_i = px[i] ;
/* this includes the check for NA */
if( x_i = 0 ) error( needs non negative integer ) ;
s += x_i ;
}

SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
int * p_res = INTEGER(res) ;
for( int i=0; in; i++){
x_i = px[i] ;
for( int j=0; jx_i; j++, p_res++)
*p_res = j+1 ;
}
UNPROTECT(1) ;
return res ;
' )
function( nvec ){
fx( as.integer(nvec) )
}
})


And here are some timings:

 x - 1:1
 system.time( a - sequence(x ) )
utilisateur système  écoulé
  0.191   0.108   0.298
 system.time( b - sequence_c(x ) )
utilisateur système  écoulé
  0.060   0.063   0.122
 identical( a, b )
[1] TRUE



 system.time( for( i in 1:1) sequence(1:10) )
utilisateur système  écoulé
  0.119   0.000   0.119

 system.time( for( i in 1:1) sequence_c(1:10) )
utilisateur système  écoulé
  0.019   0.000   0.019


I would write a proper patch if someone from R-core is willing to push it.

Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] faster base::sequence

2010-11-28 Thread Prof Brian Ripley

Is sequence used enough to warrant this?  As the help page says

 Note that ‘sequence - function(nvec) unlist(lapply(nvec,
 seq_len))’ and it mainly exists in reverence to the very early
 history of R.

I regard it as unsafe to assume that NA_INTEGER will always be 
negative, and bear in mind that at some point not so far off R 
integers (or at least lengths) will need to be more than 32-bit.


On Sun, 28 Nov 2010, Romain Francois wrote:


Hello,

Based on yesterday's R-help thread (help: program efficiency), and following 
Bill's suggestions, it appeared that sequence:



sequence

function (nvec)
unlist(lapply(nvec, seq_len))
environment: namespace:base

could benefit from being written in C to avoid unnecessary memory 
allocations.


I made this version using inline:

require( inline )
sequence_c - local( {
   fx - cfunction( signature( x = integer), '
   int n = length(x) ;
   int* px = INTEGER(x) ;
   int x_i, s = 0 ;
   /* error checking */
   for( int i=0; in; i++){
   x_i = px[i] ;
   /* this includes the check for NA */
   if( x_i = 0 ) error( needs non negative integer ) ;
   s += x_i ;
   }

   SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
   int * p_res = INTEGER(res) ;
   for( int i=0; in; i++){
   x_i = px[i] ;
   for( int j=0; jx_i; j++, p_res++)
   *p_res = j+1 ;
   }
   UNPROTECT(1) ;
   return res ;
   ' )
   function( nvec ){
   fx( as.integer(nvec) )
   }
})


And here are some timings:


x - 1:1
system.time( a - sequence(x ) )

utilisateur système  écoulé
 0.191   0.108   0.298

system.time( b - sequence_c(x ) )

utilisateur système  écoulé
 0.060   0.063   0.122

identical( a, b )

[1] TRUE




system.time( for( i in 1:1) sequence(1:10) )

utilisateur système  écoulé
 0.119   0.000   0.119


system.time( for( i in 1:1) sequence_c(1:10) )

utilisateur système  écoulé
 0.019   0.000   0.019


I would write a proper patch if someone from R-core is willing to push it.

Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] faster base::sequence

2010-11-28 Thread Romain Francois

Le 28/11/10 10:30, Prof Brian Ripley a écrit :

Is sequence used enough to warrant this? As the help page says

Note that ‘sequence - function(nvec) unlist(lapply(nvec,
seq_len))’ and it mainly exists in reverence to the very early
history of R.


I don't know. Would it be used more if it were more efficient ?


I regard it as unsafe to assume that NA_INTEGER will always be negative,
and bear in mind that at some point not so far off R integers (or at
least lengths) will need to be more than 32-bit.


sure. updated and dressed up as a patch.

I've made it a .Call because I'm not really comfortable with .Internal, 
etc ...


Do you mean that I should also use something else instead of int and 
int*. Is there some future proof typedef or macro for the type 
associated with INTSXP ?




On Sun, 28 Nov 2010, Romain Francois wrote:


Hello,

Based on yesterday's R-help thread (help: program efficiency), and
following Bill's suggestions, it appeared that sequence:


sequence

function (nvec)
unlist(lapply(nvec, seq_len))
environment: namespace:base

could benefit from being written in C to avoid unnecessary memory
allocations.

I made this version using inline:

require( inline )
sequence_c - local( {
fx - cfunction( signature( x = integer), '
int n = length(x) ;
int* px = INTEGER(x) ;
int x_i, s = 0 ;
/* error checking */
for( int i=0; in; i++){
x_i = px[i] ;
/* this includes the check for NA */
if( x_i = 0 ) error( needs non negative integer ) ;
s += x_i ;
}

SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
int * p_res = INTEGER(res) ;
for( int i=0; in; i++){
x_i = px[i] ;
for( int j=0; jx_i; j++, p_res++)
*p_res = j+1 ;
}
UNPROTECT(1) ;
return res ;
' )
function( nvec ){
fx( as.integer(nvec) )
}
})


And here are some timings:


x - 1:1
system.time( a - sequence(x ) )

utilisateur système écoulé
0.191 0.108 0.298

system.time( b - sequence_c(x ) )

utilisateur système écoulé
0.060 0.063 0.122

identical( a, b )

[1] TRUE




system.time( for( i in 1:1) sequence(1:10) )

utilisateur système écoulé
0.119 0.000 0.119


system.time( for( i in 1:1) sequence_c(1:10) )

utilisateur système écoulé
0.019 0.000 0.019


I would write a proper patch if someone from R-core is willing to push
it.

Romain


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube

Index: src/library/base/R/seq.R
===
--- src/library/base/R/seq.R(revision 53680)
+++ src/library/base/R/seq.R(working copy)
@@ -85,4 +85,6 @@
 }
 
 ## In reverence to the very first versions of R which already had sequence():
-sequence - function(nvec) unlist(lapply(nvec, seq_len))
+# sequence - function(nvec) unlist(lapply(nvec, seq_len))
+sequence - function(nvec) .Call( sequence, as.integer(nvec), PACKAGE = 
base )
+
Index: src/main/registration.c
===
--- src/main/registration.c (revision 53680)
+++ src/main/registration.c (working copy)
@@ -245,6 +245,8 @@
 CALLDEF(bitwiseOr, 2),
 CALLDEF(bitwiseXor, 2),
 
+/* sequence */
+CALLDEF(sequence,1),
 {NULL, NULL, 0}
 };
 
Index: src/main/seq.c
===
--- src/main/seq.c  (revision 53680)
+++ src/main/seq.c  (working copy)
@@ -679,3 +679,28 @@
 
 return ans;
 }
+
+SEXP attribute_hidden sequence(SEXP x)
+{
+   R_len_t n = length(x), s = 0 ;
+   int *px = INTEGER(x) ;
+   int x_i ;
+   /* error checking */
+   for( int i=0; in; i++){
+   x_i = px[i] ;
+   if( x_i == NA_INTEGER || x_i = 0 ) 
+   error( _(argument must be coercible to non-negative integer) ) ;
+   s += x_i ;
+   }
+   
+   SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
+   int *p_res = INTEGER(res) ;
+   for( int i=0; in; i++){
+   x_i = px[i] ;
+   for( int j=0; jx_i; j++, p_res++)
+   *p_res = j+1 ;
+   }
+   UNPROTECT(1) ;
+   return res ;
+}
+
Index: src/main/basedecl.h
===
--- src/main/basedecl.h (revision 53680)
+++ src/main/basedecl.h (working copy)
@@ -114,3 +114,6 @@
 SEXP bitwiseAnd(SEXP, SEXP);
 SEXP bitwiseOr(SEXP, SEXP);
 SEXP bitwiseXor(SEXP, SEXP);
+
+SEXP sequence(SEXP);
+
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] faster base::sequence

2010-11-28 Thread Prof Brian Ripley

On Sun, 28 Nov 2010, Romain Francois wrote:


Le 28/11/10 10:30, Prof Brian Ripley a écrit :

Is sequence used enough to warrant this? As the help page says

Note that ‘sequence - function(nvec) unlist(lapply(nvec,
seq_len))’ and it mainly exists in reverence to the very early
history of R.


I don't know. Would it be used more if it were more efficient ?


It is for you to make a compelling case for others to do work 
(maintain changed code) for your wish.



I regard it as unsafe to assume that NA_INTEGER will always be negative,
and bear in mind that at some point not so far off R integers (or at
least lengths) will need to be more than 32-bit.


sure. updated and dressed up as a patch.

I've made it a .Call because I'm not really comfortable with .Internal, etc 
...


Do you mean that I should also use something else instead of int and 
int*. Is there some future proof typedef or macro for the type associated 
with INTSXP ?


Not yet.  I was explaining why NA_INTEGER might change.




On Sun, 28 Nov 2010, Romain Francois wrote:


Hello,

Based on yesterday's R-help thread (help: program efficiency), and
following Bill's suggestions, it appeared that sequence:


sequence

function (nvec)
unlist(lapply(nvec, seq_len))
environment: namespace:base

could benefit from being written in C to avoid unnecessary memory
allocations.

I made this version using inline:

require( inline )
sequence_c - local( {
fx - cfunction( signature( x = integer), '
int n = length(x) ;
int* px = INTEGER(x) ;
int x_i, s = 0 ;
/* error checking */
for( int i=0; in; i++){
x_i = px[i] ;
/* this includes the check for NA */
if( x_i = 0 ) error( needs non negative integer ) ;
s += x_i ;
}

SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
int * p_res = INTEGER(res) ;
for( int i=0; in; i++){
x_i = px[i] ;
for( int j=0; jx_i; j++, p_res++)
*p_res = j+1 ;
}
UNPROTECT(1) ;
return res ;
' )
function( nvec ){
fx( as.integer(nvec) )
}
})


And here are some timings:


x - 1:1
system.time( a - sequence(x ) )

utilisateur système écoulé
0.191 0.108 0.298

system.time( b - sequence_c(x ) )

utilisateur système écoulé
0.060 0.063 0.122

identical( a, b )

[1] TRUE




system.time( for( i in 1:1) sequence(1:10) )

utilisateur système écoulé
0.119 0.000 0.119


system.time( for( i in 1:1) sequence_c(1:10) )

utilisateur système écoulé
0.019 0.000 0.019


I would write a proper patch if someone from R-core is willing to push
it.

Romain


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] faster base::sequence

2010-11-28 Thread Romain Francois

Le 28/11/10 11:30, Prof Brian Ripley a écrit :

On Sun, 28 Nov 2010, Romain Francois wrote:


Le 28/11/10 10:30, Prof Brian Ripley a écrit :

Is sequence used enough to warrant this? As the help page says

Note that ‘sequence - function(nvec) unlist(lapply(nvec,
seq_len))’ and it mainly exists in reverence to the very early
history of R.


I don't know. Would it be used more if it were more efficient ?


It is for you to make a compelling case for others to do work (maintain
changed code) for your wish.


No trouble. The patch is there, if anyone finds it interesting or 
compelling, they will speak up I suppose.


Otherwise it is fine for me if it ends up in no man's land. I have the 
code, if I want to use it, I can squeeze it in a package.



I regard it as unsafe to assume that NA_INTEGER will always be negative,
and bear in mind that at some point not so far off R integers (or at
least lengths) will need to be more than 32-bit.


sure. updated and dressed up as a patch.

I've made it a .Call because I'm not really comfortable with
.Internal, etc ...



Do you mean that I should also use something else instead of int and
int*. Is there some future proof typedef or macro for the type
associated with INTSXP ?


Not yet. I was explaining why NA_INTEGER might change.


sure. thanks for the reminder.


On Sun, 28 Nov 2010, Romain Francois wrote:


Hello,

Based on yesterday's R-help thread (help: program efficiency), and
following Bill's suggestions, it appeared that sequence:


sequence

function (nvec)
unlist(lapply(nvec, seq_len))
environment: namespace:base

could benefit from being written in C to avoid unnecessary memory
allocations.

I made this version using inline:

require( inline )
sequence_c - local( {
fx - cfunction( signature( x = integer), '
int n = length(x) ;
int* px = INTEGER(x) ;
int x_i, s = 0 ;
/* error checking */
for( int i=0; in; i++){
x_i = px[i] ;
/* this includes the check for NA */
if( x_i = 0 ) error( needs non negative integer ) ;
s += x_i ;
}

SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
int * p_res = INTEGER(res) ;
for( int i=0; in; i++){
x_i = px[i] ;
for( int j=0; jx_i; j++, p_res++)
*p_res = j+1 ;
}
UNPROTECT(1) ;
return res ;
' )
function( nvec ){
fx( as.integer(nvec) )
}
})


And here are some timings:


x - 1:1
system.time( a - sequence(x ) )

utilisateur système écoulé
0.191 0.108 0.298

system.time( b - sequence_c(x ) )

utilisateur système écoulé
0.060 0.063 0.122

identical( a, b )

[1] TRUE




system.time( for( i in 1:1) sequence(1:10) )

utilisateur système écoulé
0.119 0.000 0.119


system.time( for( i in 1:1) sequence_c(1:10) )

utilisateur système écoulé
0.019 0.000 0.019


I would write a proper patch if someone from R-core is willing to push
it.

Romain


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube







--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel