Re: [Rd] faster base::sequence
Hi Romain, FWIW I see at least 2 small differences in the way sequence_c() behaves with respect to good old sequence(): zeros and names. sequence(c(a=5, b=0, c=2)) a1 a2 a3 a4 a5 c1 c2 1 2 3 4 5 1 2 sequence_c() ignores the names and doesn't support zeros in the input. Cheers, H. On 11/28/2010 01:56 AM, Romain Francois wrote: Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax:(206) 667-1319 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] faster base::sequence
Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube Index: src/library/base/R/seq.R === --- src/library/base/R/seq.R(revision 53680) +++ src/library/base/R/seq.R(working copy) @@ -85,4 +85,6 @@ } ## In reverence to the very first versions of R which already had sequence(): -sequence - function(nvec) unlist(lapply(nvec, seq_len)) +# sequence - function(nvec) unlist(lapply(nvec, seq_len)) +sequence - function(nvec) .Call( sequence, as.integer(nvec), PACKAGE = base ) + Index: src/main/registration.c === --- src/main/registration.c (revision 53680) +++ src/main/registration.c (working copy) @@ -245,6 +245,8 @@ CALLDEF(bitwiseOr, 2), CALLDEF(bitwiseXor, 2), +/* sequence */ +CALLDEF(sequence,1), {NULL, NULL, 0} }; Index: src/main/seq.c === --- src/main/seq.c (revision 53680) +++ src/main/seq.c (working copy) @@ -679,3 +679,28 @@ return ans; } + +SEXP attribute_hidden sequence(SEXP x) +{ + R_len_t n = length(x), s = 0 ; + int *px = INTEGER(x) ; + int x_i ; + /* error checking */ + for( int i=0; in; i++){ + x_i = px[i] ; + if( x_i == NA_INTEGER || x_i = 0 ) + error( _(argument must be coercible to non-negative integer) ) ; + s += x_i ; + } + + SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; + int *p_res = INTEGER(res) ; + for( int i=0; in; i++){ + x_i = px[i] ; + for( int j=0; jx_i; j++, p_res++) + *p_res = j+1 ; + } + UNPROTECT(1) ; + return res ; +} + Index: src/main/basedecl.h === --- src/main/basedecl.h (revision 53680) +++ src/main/basedecl.h (working copy) @@ -114,3 +114,6 @@ SEXP bitwiseAnd(SEXP, SEXP); SEXP bitwiseOr(SEXP, SEXP); SEXP bitwiseXor(SEXP, SEXP); + +SEXP sequence(SEXP); + __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
On Sun, 28 Nov 2010, Romain Francois wrote: Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? It is for you to make a compelling case for others to do work (maintain changed code) for your wish. I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? Not yet. I was explaining why NA_INTEGER might change. On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
Le 28/11/10 11:30, Prof Brian Ripley a écrit : On Sun, 28 Nov 2010, Romain Francois wrote: Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? It is for you to make a compelling case for others to do work (maintain changed code) for your wish. No trouble. The patch is there, if anyone finds it interesting or compelling, they will speak up I suppose. Otherwise it is fine for me if it ends up in no man's land. I have the code, if I want to use it, I can squeeze it in a package. I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? Not yet. I was explaining why NA_INTEGER might change. sure. thanks for the reminder. On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel