[Rd] faster base::sequence
Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube Index: src/library/base/R/seq.R === --- src/library/base/R/seq.R(revision 53680) +++ src/library/base/R/seq.R(working copy) @@ -85,4 +85,6 @@ } ## In reverence to the very first versions of R which already had sequence(): -sequence - function(nvec) unlist(lapply(nvec, seq_len)) +# sequence - function(nvec) unlist(lapply(nvec, seq_len)) +sequence - function(nvec) .Call( sequence, as.integer(nvec), PACKAGE = base ) + Index: src/main/registration.c === --- src/main/registration.c (revision 53680) +++ src/main/registration.c (working copy) @@ -245,6 +245,8 @@ CALLDEF(bitwiseOr, 2), CALLDEF(bitwiseXor, 2), +/* sequence */ +CALLDEF(sequence,1), {NULL, NULL, 0} }; Index: src/main/seq.c === --- src/main/seq.c (revision 53680) +++ src/main/seq.c (working copy) @@ -679,3 +679,28 @@ return ans; } + +SEXP attribute_hidden sequence(SEXP x) +{ + R_len_t n = length(x), s = 0 ; + int *px = INTEGER(x) ; + int x_i ; + /* error checking */ + for( int i=0; in; i++){ + x_i = px[i] ; + if( x_i == NA_INTEGER || x_i = 0 ) + error( _(argument must be coercible to non-negative integer) ) ; + s += x_i ; + } + + SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; + int *p_res = INTEGER(res) ; + for( int i=0; in; i++){ + x_i = px[i] ; + for( int j=0; jx_i; j++, p_res++) + *p_res = j+1 ; + } + UNPROTECT(1) ; + return res ; +} + Index: src/main/basedecl.h === --- src/main/basedecl.h (revision 53680) +++ src/main/basedecl.h (working copy) @@ -114,3 +114,6 @@ SEXP bitwiseAnd(SEXP, SEXP); SEXP bitwiseOr(SEXP, SEXP); SEXP bitwiseXor(SEXP, SEXP); + +SEXP sequence(SEXP); + __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] .Rdata file in data subdirectory won't load
Greetings, I wanted to add a dataset to a complete R package I am working on (the package cleanly installs and passes the R CMD check). The data (a matrix) was saved, and the save() image dragged to the /data folder, and is a .Rdata file. It can be read directly using load (see below), but now the R CMD check indicates subdirectory data contains no datasets and it won't load using 'data()'. I have read through the R extensions manual and Leisch's tutorial, but can't find a good hint as to what is going wrong. I also tried this with 'polygon1' added to the 'export' in the NAMESPACE, with no effect. library(latticeDensity) data(polygon1) Warning message: In data(polygon1) : data set 'polygon1' not found file.choose() [1] C:\\Documents and Settings\\Ronald Barry\\My Documents\\latticeDensity\\data\\polygon1.Rdata load( C:\\Documents and Settings\\Ronald Barry\\My Documents\\latticeDensity\\data\\polygon1.Rdata) polygon1 [,1] [,2] [1,] 0.6421053 0.8132050 [2,] 0.6845247 0.4814305 [3,] 0.7057345 0.2858322 [4,] 0.6696779 0.2066025 [5,] 0.5190888 0.1892710 [6,] 0.5445405 0.4145805 [7,] 0.5424195 0.6275103 [8,] 0.5233307 0.7983494 [9,] 0.500 0.7983494 [10,] 0.5127258 0.5458047 [11,] 0.5042419 0.3303989 [12,] 0.4851532 0.1348006 [13,] 0.2836606 0.1843191 [14,] 0.2582090 0.3675378 [15,] 0.1733700 0.6795048 [16,] 0.3154753 0.8057772 [17,] 0.2391202 1.0162311 [18,] 0.5381775 0.9592847 [19,] 0.7333071 0.9320495 Thank you for any pointers. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
On Sun, 28 Nov 2010, Romain Francois wrote: Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? It is for you to make a compelling case for others to do work (maintain changed code) for your wish. I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? Not yet. I was explaining why NA_INTEGER might change. On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] faster base::sequence
Le 28/11/10 11:30, Prof Brian Ripley a écrit : On Sun, 28 Nov 2010, Romain Francois wrote: Le 28/11/10 10:30, Prof Brian Ripley a écrit : Is sequence used enough to warrant this? As the help page says Note that ‘sequence - function(nvec) unlist(lapply(nvec, seq_len))’ and it mainly exists in reverence to the very early history of R. I don't know. Would it be used more if it were more efficient ? It is for you to make a compelling case for others to do work (maintain changed code) for your wish. No trouble. The patch is there, if anyone finds it interesting or compelling, they will speak up I suppose. Otherwise it is fine for me if it ends up in no man's land. I have the code, if I want to use it, I can squeeze it in a package. I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit. sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of int and int*. Is there some future proof typedef or macro for the type associated with INTSXP ? Not yet. I was explaining why NA_INTEGER might change. sure. thanks for the reminder. On Sun, 28 Nov 2010, Romain Francois wrote: Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence: sequence function (nvec) unlist(lapply(nvec, seq_len)) environment: namespace:base could benefit from being written in C to avoid unnecessary memory allocations. I made this version using inline: require( inline ) sequence_c - local( { fx - cfunction( signature( x = integer), ' int n = length(x) ; int* px = INTEGER(x) ; int x_i, s = 0 ; /* error checking */ for( int i=0; in; i++){ x_i = px[i] ; /* this includes the check for NA */ if( x_i = 0 ) error( needs non negative integer ) ; s += x_i ; } SEXP res = PROTECT( allocVector( INTSXP, s ) ) ; int * p_res = INTEGER(res) ; for( int i=0; in; i++){ x_i = px[i] ; for( int j=0; jx_i; j++, p_res++) *p_res = j+1 ; } UNPROTECT(1) ; return res ; ' ) function( nvec ){ fx( as.integer(nvec) ) } }) And here are some timings: x - 1:1 system.time( a - sequence(x ) ) utilisateur système écoulé 0.191 0.108 0.298 system.time( b - sequence_c(x ) ) utilisateur système écoulé 0.060 0.063 0.122 identical( a, b ) [1] TRUE system.time( for( i in 1:1) sequence(1:10) ) utilisateur système écoulé 0.119 0.000 0.119 system.time( for( i in 1:1) sequence_c(1:10) ) utilisateur système écoulé 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://bit.ly/9VOd3l : ZAT! 2010 |- http://bit.ly/c6DzuX : Impressionnism with R `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Rdata file in data subdirectory won't load
It needs to be polygon1.RData, not .Rdata. You've not actually told us your OS, but it looks like you imagine that R is case-insensitive just because Windows is. On Sun, 28 Nov 2010, Ronald Barry wrote: Greetings, I wanted to add a dataset to a complete R package I am working on (the package cleanly installs and passes the R CMD check). The data (a matrix) was saved, and the save() image dragged to the /data folder, and is a .Rdata file. It can be read directly using load (see below), but now the R CMD check indicates subdirectory data contains no datasets and it won't load using 'data()'. I have read through the R extensions manual and Leisch's tutorial, but can't find Try ?data ! a good hint as to what is going wrong. I also tried this with 'polygon1' added to the 'export' in the NAMESPACE, with no effect. library(latticeDensity) data(polygon1) Warning message: In data(polygon1) : data set 'polygon1' not found file.choose() [1] C:\\Documents and Settings\\Ronald Barry\\My Documents\\latticeDensity\\data\\polygon1.Rdata load( C:\\Documents and Settings\\Ronald Barry\\My Documents\\latticeDensity\\data\\polygon1.Rdata) polygon1 [,1] [,2] [1,] 0.6421053 0.8132050 [2,] 0.6845247 0.4814305 [3,] 0.7057345 0.2858322 [4,] 0.6696779 0.2066025 [5,] 0.5190888 0.1892710 [6,] 0.5445405 0.4145805 [7,] 0.5424195 0.6275103 [8,] 0.5233307 0.7983494 [9,] 0.500 0.7983494 [10,] 0.5127258 0.5458047 [11,] 0.5042419 0.3303989 [12,] 0.4851532 0.1348006 [13,] 0.2836606 0.1843191 [14,] 0.2582090 0.3675378 [15,] 0.1733700 0.6795048 [16,] 0.3154753 0.8057772 [17,] 0.2391202 1.0162311 [18,] 0.5381775 0.9592847 [19,] 0.7333071 0.9320495 Thank you for any pointers. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] package matrix dummy.cpp
Hi, The recommended package matrix contains an empty file src/dummy.cpp which results in using g++ instead of gcc to link Matrix.so. What is the reason for that? Is there any difference between using g++ or gcc? (There are no other cpp files in the source) I asked the maintainers of the package (matrix-auth...@r-project.org) 3 weeks ago but haven't received any answer. On my system (NixOS Linux distribution, http://nixos.org) I can't compile package Matrix unless this file is deleted. Thank you very much, Ambrus Kaposi __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] package matrix dummy.cpp
It is Matrix, not matrix I too have corresponded with them about this. It seems to be a legacy from when the package contained C++ code, and can now be deleted. On Sun, 28 Nov 2010, Ambrus Kaposi wrote: Hi, The recommended package matrix contains an empty file src/dummy.cpp which results in using g++ instead of gcc to link Matrix.so. What is the reason for that? Is there any difference between using g++ or gcc? (There are no other cpp files in the source) I asked the maintainers of the package (matrix-auth...@r-project.org) 3 weeks ago but haven't received any answer. On my system (NixOS Linux distribution, http://nixos.org) I can't compile package Matrix unless this file is deleted. Most likely you have not installed the C++ compiler (which is usually g++ on Linux) -- but you shouldn't need to in order to install R. Thank you very much, Ambrus Kaposi __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] switch() disallowing multiple default values Re: Bug in parseNamespaceFile or switch( , ... ) ?
I've now committed changes in R-devel and R-patched to detect cases where a call to switch() contains multiple unnamed alternatives. The code only complains if the EXPR argument is a character string; unnamed alternatives are fine with numeric switching. Adding this check turned up 3 more typos like this in the base code besides the one in parseNamespaceFile. I expect it will turn up quite a few more in CRAN and Bioconductor packages. Please let me know right away if you've got correct code that generates the warnings or errors. Duncan Murdoch In R-devel they're an error, in R-patched they'll just give a warning. On 27/11/2010 7:09 PM, Duncan Murdoch wrote: On 27/11/2010 6:50 PM, Duncan Murdoch wrote: On 27/11/2010 5:58 PM, Charles C. Berry wrote: parseNamespaceFile() doesn't seem to detect misspelled directives. Looking at its code I see switch(as.character(e[[1L]]), lots of args omitted here, stop(gettextf(unknown namespace directive: %s, deparse(e)), call. = FALSE, domain = NA)) but this doesn't seem to function as I expect, viz. to stop with an error if I type a wrong directive. You're right, there was a typo in parseNamespaceFile. (The typo was in this line: =, - = { This should have been = =, - = { Without the extra = sign, the = was taken as the default value of the switch, and the stop() was never reached. Conceivably switch() should complain if it is called with more than one default. I suspect when I fix this it's going to flush out some typos in packages on CRAN... Duncan Murdoch Duncan Murdoch Details: # create dummy NAMESPACE file with two bad / one good directives cat(blah( nada )\nblee( nil )\nexport( outDS )\n,file=NAMESPACE) readLines(NAMESPACE) [1] blah( nada )blee( nil ) export( outDS ) parseNamespaceFile(,.) # now parse it $imports list() $exports [1] outDS $exportPatterns character(0) $importClasses list() $importMethods list() $exportClasses character(0) $exportMethods character(0) $exportClassPatterns character(0) $dynlibs character(0) $nativeRoutines list() $S3methods [,1] [,2] [,3] So, it picked up 'export' and ignored the other two lines. Chuck p.s. sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base Charles C. BerryDept of Family/Preventive Medicine cbe...@tajo.ucsd.eduUC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] tar R command
First, if you look carefully, then you see that argument 'files' should specify *filepaths*, i.e. directories and not specific files. Thus, if you for instance place your files in directory foo/ and then call tar(foo.tar, files=foo/); you would do the right thing. HOWEVER, looking at the internals of base::tar(), it seems to be designed for a non-Windows platform, i.e. it will not work on Windows as it stands (more below). A workaround that also illustrating the problems are the following patch(es): # PATCH for file.info() such that tar() works on Windows tar - utils::tar; environment(tar) - globalenv(); file.info - function(...) { fi - base::file.info(...); fi[setdiff(c(uid, gid, uname, grname), names(fi))] - NA; fi; } # file.info() Example: dir.create(foo/); cat(file=foo/foo.txt, rep(letters, times=100)); tar(foo.tar, files=foo/); str(file.info(foo.tar)); 'data.frame': 1 obs. of 11 variables: $ size : num 7680 $ isdir : logi FALSE $ mode :Class 'octmode' int 438 $ mtime : POSIXct, format: 2010-11-28 20:24:05 $ ctime : POSIXct, format: 2010-11-28 20:03:56 $ atime : POSIXct, format: 2010-11-28 20:07:40 $ exe : chr no $ uid : logi NA $ gid : logi NA $ uname : logi NA $ grname: logi NA This seems to generate a valid foo.tar file. PROBLEMS: Here are a few problems I have identified with tar(). PROBLEM #1: The default for argument files=NULL is documented to archive all files under the current directory. In reality it gives: Error in list.files(files, recursive = TRUE, all.files = TRUE, full.names = TRUE: invalid 'directory' argument because list.files(NULL) is invalid. The default should instead be files=.. PROBLEM #2: If passing a non-existing path (argument 'files'), then tar() generates an invalid *.tar file of size 1024 bytes (not empty as OP say). Better would be to assert that each of the directories requested really exists and are directories, e.g. using file.info()$dir. PROBLEM #3: tar() assumes that file.info() returns a data.frame with fields 'uid', 'gid' and 'uname'. That is not the case for file.info() on Windows. sessionInfo() R version 2.12.0 Patched (2010-11-24 r53656) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base My $0.20 /Henrik On Sun, Nov 28, 2010 at 7:00 PM, Dario Strbenac d.strbe...@garvan.org.au wrote: Hello, The documentation for the tar command leads me to think there is an internal implementation when the command can't be found in the OS. However, it doesn't seem to be the case, as I get an empty .tar file generated on a small example I made : dir(pattern = jpg) [1] MA56237502_635.jpg file.info(MA56237502_635.jpg) size isdir mode mtime ctime atime exe MA56237502_635.jpg 229831 FALSE 666 2010-11-29 13:05:49 2010-11-29 13:00:36 2010-11-29 13:00:36 no tar(example.tar, files = dir(pattern = jpg)) file.info(example.tar) size isdir mode mtime ctime atime exe example.tar 1024 FALSE 666 2010-11-29 13:43:29 2010-11-29 13:42:30 2010-11-29 13:42:30 no Is this an unimplemented feature ? sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-pc-mingw32/x64 (64-bit) ... ... ... Thanks, Dario. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel