Re: [Rd] Where to drop a python script?
The old convention was that it went in the exec/ directory, but as you can see at http://cran.at.r-project.org/doc/manuals/r-devel/R-exts.html#Non_002dR-scripts-in-packages it can be in inst/anyName/. A minor convenience of exec/ is that the directory has the same name in source and when installed, whereas inst/anyName gets moved to anyName/, so debugging can be a tiny bit easier with exec/. Having just put a package (TSjson) on CRAN with a python script, here are a few other pointers for getting it on CRAN: -SystemRequirements: should indicate if a particular version of python is needed, and any non-default modules that are needed. (My package does not work with Python 3 because some modules are not available.) Some of the libraries have changed, so it could be a bit tricky to make something work easily with both 2 and 3. -You need a README to explain how to install Python. (If you look at or use mine, please let me know if you find problems.) -The Linux and Sun CRAN test machines have Python 2 whereas winbuilder has Python 3. Be prepared to explain that the package will not work on one or the other. Another option to system() is pipe() Paul On 13-10-30 03:15 PM, Dirk Eddelbuettel wrote: On 30 October 2013 at 13:54, Jonathan Greenberg wrote: | R-developers: | | I have a small python script that I'd like to include in an R package I'm | developing, but I'm a bit unclear about which subfolder it should go in. R | will be calling the script via a system() call. Thanks! Up to you as you control the path. As "Writing R Extensions" explains, everything below the (source) directory inst/ will get installed. I like inst/extScripts/ (or similar) as it denotes that it is an external script. As an example, the gdata package has Perl code for xls reading/writing below a directory inst/perl/ -- and I think there are more packages doing this. Dirk __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Huge performance difference between implicit and explicit print
Hadley, As far as I can tell from a quick look, it is because implicit printing uses a different mechanism which does a fair bit more work. >From comments in print.c in the R sources: * print.default() -> do_printdefault (with call tree below) * * auto-printing -> PrintValueEnv * -> PrintValueRec * -> call print() for objects * Note that auto-printing does not call print.default. * PrintValue, R_PV are similar to auto-printing. PrintValueEnv includes, among other things, checks for functions, S4 objects, and s3 objects before constructing (in C code) an R call to print for S3 objects and show for S4 objects and evaluating it using Rf_eval. So there is an extra trip to the R evaluator. I imagine that extra work is where the hangup is but that is a slightly-informed guess as I haven't done any detailed timings or checks. Basically my understanding of the processes is as follows: print(df) print call is evaluated, S3 dispatch happens, print.default in C is called, result printed to terminal, print call returns df expression "df" evaluated, auto-print initiated, type of object returned by expression is determined, print call is constructed in C code, print call is evaluated in C code, THEN all the stuff above happens. I dunno if that helps or not as I can't speak to how to change/fix it atm. ~G On Wed, Oct 30, 2013 at 3:22 PM, Hadley Wickham wrote: > Hi all, > > Can anyone help me understand why an implicit print (i.e. just typing > df at the console), is so much slower than an explicit print (i.e. > print(df)) in the example below? I see the difference in both Rstudio > and in a terminal. > > # Construct large df as quickly as possible > dummy <- 1:18e6 > df <- lapply(1:10, function(x) dummy) > names(df) <- letters[1:10] > class(df) <- c("myobj", "data.frame") > attr(df, "row.names") <- .set_row_names(18e6) > > print.myobj <- function(x, ...) { > print.data.frame(head(x, 2)) > } > > start <- proc.time(); df; flush.console(); proc.time() - start > # user system elapsed > # 0.408 0.557 0.965 > start <- proc.time(); print(df); flush.console(); proc.time() - start > # user system elapsed > # 0.019 0.002 0.020 > > sessionInfo() > # R version 3.0.2 (2013-09-25) > # Platform: x86_64-apple-darwin10.8.0 (64-bit) > # > # locale: > # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > # > # attached base packages: > # [1] stats graphics grDevices utils datasets methods base > > Thanks! > > Hadley > > -- > Chief Scientist, RStudio > http://had.co.nz/ > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Gabriel Becker Graduate Student Statistics Department University of California, Davis [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Huge performance difference between implicit and explicit print
On Wed, Oct 30, 2013 at 6:22 PM, Hadley Wickham wrote: > Hi all, > > Can anyone help me understand why an implicit print (i.e. just typing > df at the console), is so much slower than an explicit print (i.e. > print(df)) in the example below? I see the difference in both Rstudio > and in a terminal. > > # Construct large df as quickly as possible > dummy <- 1:18e6 > df <- lapply(1:10, function(x) dummy) > names(df) <- letters[1:10] > class(df) <- c("myobj", "data.frame") > attr(df, "row.names") <- .set_row_names(18e6) > > print.myobj <- function(x, ...) { > print.data.frame(head(x, 2)) > } > > start <- proc.time(); df; flush.console(); proc.time() - start > # user system elapsed > # 0.408 0.557 0.965 > start <- proc.time(); print(df); flush.console(); proc.time() - start > # user system elapsed > # 0.019 0.002 0.020 If I change print(df) to print.data.frame(df) it hangs. R version 3.0.2 Patched (2013-10-06 r64031) -- "Frisbee Sailing" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Huge performance difference between implicit and explicit print
Hi all, Can anyone help me understand why an implicit print (i.e. just typing df at the console), is so much slower than an explicit print (i.e. print(df)) in the example below? I see the difference in both Rstudio and in a terminal. # Construct large df as quickly as possible dummy <- 1:18e6 df <- lapply(1:10, function(x) dummy) names(df) <- letters[1:10] class(df) <- c("myobj", "data.frame") attr(df, "row.names") <- .set_row_names(18e6) print.myobj <- function(x, ...) { print.data.frame(head(x, 2)) } start <- proc.time(); df; flush.console(); proc.time() - start # user system elapsed # 0.408 0.557 0.965 start <- proc.time(); print(df); flush.console(); proc.time() - start # user system elapsed # 0.019 0.002 0.020 sessionInfo() # R version 3.0.2 (2013-09-25) # Platform: x86_64-apple-darwin10.8.0 (64-bit) # # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # # attached base packages: # [1] stats graphics grDevices utils datasets methods base Thanks! Hadley -- Chief Scientist, RStudio http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Where to drop a python script?
On Oct 30, 2013, at 1:54 PM, Jonathan Greenberg wrote: > R-developers: > > I have a small python script that I'd like to include in an R package I'm > developing, but I'm a bit unclear about which subfolder it should go in. R > will be calling the script via a system() call. Thanks! > > --j See Writing R Extensions Manual, section 1.1.7: http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Non_002dR-scripts-in-packages If you want to see a package example, my WriteXLS package uses Perl, but the concepts will be the same: https://github.com/marcschwartz/WriteXLS If you look at WriteXLS.R around line 130, you can see an example of getting the $PATH to the included Perl scripts that I use, which are in the 'inst/Perl' folder. Further down around line 230, is where the script is called via system(). Note the use of shQuote() for some arguments. Regards, Marc Schwartz __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Where to drop a python script?
On 30 October 2013 at 13:54, Jonathan Greenberg wrote: | R-developers: | | I have a small python script that I'd like to include in an R package I'm | developing, but I'm a bit unclear about which subfolder it should go in. R | will be calling the script via a system() call. Thanks! Up to you as you control the path. As "Writing R Extensions" explains, everything below the (source) directory inst/ will get installed. I like inst/extScripts/ (or similar) as it denotes that it is an external script. As an example, the gdata package has Perl code for xls reading/writing below a directory inst/perl/ -- and I think there are more packages doing this. Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Where to drop a python script?
R-developers: I have a small python script that I'd like to include in an R package I'm developing, but I'm a bit unclear about which subfolder it should go in. R will be calling the script via a system() call. Thanks! --j -- Jonathan A. Greenberg, PhD Assistant Professor Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 259 Computing Applications Building, MC-150 605 East Springfield Avenue Champaign, IL 61820-6371 Phone: 217-300-1924 http://www.geog.illinois.edu/~jgrn/ AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unique(1:3,nmax=1) freezes R
I can reproduce this bug. > sessionInfo() R version 3.0.1 (2013-05-16) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Dutch_Belgium.1252 LC_CTYPE=Dutch_Belgium.1252 LC_MONETARY=Dutch_Belgium.1252 [4] LC_NUMERIC=C LC_TIME=Dutch_Belgium.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-55 TinnR_1.0-5 R2HTML_2.2.1Hmisc_3.12-2 Formula_1.1-1 [6] survival_2.37-4 loaded via a namespace (and not attached): [1] cluster_1.14.4 fortunes_1.5-0 grid_3.0.1 lattice_0.20-23 rpart_4.1-3 [6] svMisc_0.9-69 tools_3.0.1 On Wed, Oct 30, 2013 at 9:05 AM, Helske Jouni wrote: > Dear all, > I was playing around with factor contrasts, and found the argument nmax on > function factor. When using nmax=1, R froze completely, and I had to close > it from task manager. After some debugging, I found that the problem is > actually in unique-function, where the internal unique function is called: > > .Internal(unique(x, incomparables, fromLast, nmax)) > > More generally, it looks like unique(x,nmax=k) freezes R the length of x > is larger than 2 and k=1, and when nmax unique.default(1:5, nmax = 3) : hash table is full". > > Of course using nmax=1 doesn't make much sense, but maybe some check would > be in place before calling internal unique? > > Best regards, > > Jouni Helske > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Imports, importFrom slow (for Matrix)
> Gábor Csárdi > on Tue, 29 Oct 2013 10:31:14 -0400 writes: > Oh, you mean to put Matrix:: in the functions that need > Matrix, right, of course. Then yes, this could be a > solution. I have some issue with some new class > definitions, but I can probably work them out. > Gabor otherwise, please contact matrix-auth...@r-project.org (me being one of the two). To the whole issue in the 'Subject': Yes, indeed importing from Matrix i.e., loading the Matrix namespace, is slow notably compared to many other parts of R including its startup. I have not had time to investigate, but I even have a vague feeling that it got slower than it used to be a couple of R (and Matrix) versions ago. Note that your timings below are a bit biased because you use Rscript which unfortunately does not pre-[load + attach] the 'methods' package. But then your bias is only about 0.1 seconds from my measurements (the time increases from 0.35 to 0.45; not sure if this is still a good enough reason to omit 'methods' from Rscript by default). And ... yes, R is free (aka "libre") software, and proposals for changes that speedup the loadNamespace("Matrix") without having to change the R code inside Matrix much are highly appreciated. Martin Maechler, ETH Zurich > On Tue, Oct 29, 2013 at 10:25 AM, Gábor Csárdi > wrote: >> Unfortunately that seems to be (almost) just as slow. >> >> ~$ time Rscript -e 'Matrix::summary; ls()' > /dev/null >> real 0m2.785s user 0m2.668s sys 0m0.112s >> >> Gabor >> >> On Tue, Oct 29, 2013 at 10:11 AM, Prof Brian Ripley >> wrote: >>> On 29/10/2013 14:03, Gábor Csárdi wrote: Dear All, before its latest version my package had 'Imports: Matrix' in its DESCRIPTION file, but it did not import anything in NAMESPACE. Rather, some functions explicitly loaded Matrix, as they needed. The reason for this was that importing Matrix is really slow, and only very few igraph functions need it. (I guess Matrix is slow because of the many registered names, but that is another question.) # Empty session: ~$ time Rscript -e 'ls()' > /dev/null real 0m0.251s user 0m0.196s sys 0m0.049s # Without importing from Matrix: ~$ time Rscript -e 'library(igraph); ls()' > /dev/null Loading required package: methods real 0m0.419s user 0m0.363s sys 0m0.049s # Adding importFrom(Matrix, sparseMatrix) to NAMESPACE: ~$ time Rscript -e 'library(igraph); ls()' > /dev/null Loading required package: methods real 0m2.963s user 0m2.844s sys 0m0.115s This solution was fine with me, especially because other packages depending on igraph and using Matrix through igraph worked fine on the CRAN build servers, as igraph brought Matrix with it. (The build servers don't have recommended packages like Matrix available by default.) Recently, R CMD check does not allow me to list Matrix at Imports without importing something from it. This is understandable, because it is an inconsistency after all, but caused some headache for me. A 3s loading time for a package is IMHO much longer than ideal, especially that loading R itself is ten times faster. So I definitely don't want to import from Matrix right now. The solution I settled with was to include Matrix in 'Suggests', and the load it selectively, as before. Now some packages depending on igraph are failing on the CRAN build servers, which don't have Matrix installed for these packages. (Luckily they are probably not failing for users, because most users do have the recommended packages.) In summary, it would be great to speed up imports. Another solution would be some mechanism that allows me to import from a package as needed, not at the package loading time. Something like a delayed importFrom(). >>> >>> >>> That is what Matrix:: does. There is nothing like >>> enough here for us to tell why it would not suffice for >>> you. (If you want to import something occasionally and >>> then use it very many times, make a local copy.) >>> Just wanted to bring up this issue. Best, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >>> -- >>> Brian D. Ripley, rip...@stats.ox.ac.uk Professor of >>> Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >>> University of Oxford, Tel: +44 1865 272861 (
Re: [Rd] Imports, importFrom slow (for Matrix)
> Gábor Csárdi > on Tue, 29 Oct 2013 10:31:14 -0400 writes: > Oh, you mean to put Matrix:: in the functions that need > Matrix, right, of course. Then yes, this could be a > solution. I have some issue with some new class > definitions, but I can probably work them out. > Gabor otherwise, please contact matrix-auth...@r-project.org (me being one of the two). To the whole issue in the 'Subject': Yes, indeed importing from Matrix i.e., loading the Matrix namespace, is slow notably compared to many other parts of R including its startup. I have not had time to investigate, but I even have a vague feeling that it got considerably slower than it used to be a couple of R (and Matrix) versions ago. Note that your timings below are a bit biased because you use Rscript which unfortunately does not pre-[load + attach] the 'methods' package. But your bias is only about 0.1 second from my measurements; the time increases from 0.35 to 0.45 not sure if this is still a good enough reason to omit 'methods' from Rscript. Martin > On Tue, Oct 29, 2013 at 10:25 AM, Gábor Csárdi > wrote: >> Unfortunately that seems to be (almost) just as slow. >> >> ~$ time Rscript -e 'Matrix::summary; ls()' > /dev/null >> real 0m2.785s user 0m2.668s sys 0m0.112s >> >> Gabor >> >> On Tue, Oct 29, 2013 at 10:11 AM, Prof Brian Ripley >> wrote: >>> On 29/10/2013 14:03, Gábor Csárdi wrote: Dear All, before its latest version my package had 'Imports: Matrix' in its DESCRIPTION file, but it did not import anything in NAMESPACE. Rather, some functions explicitly loaded Matrix, as they needed. The reason for this was that importing Matrix is really slow, and only very few igraph functions need it. (I guess Matrix is slow because of the many registered names, but that is another question.) # Empty session: ~$ time Rscript -e 'ls()' > /dev/null real 0m0.251s user 0m0.196s sys 0m0.049s # Without importing from Matrix: ~$ time Rscript -e 'library(igraph); ls()' > /dev/null Loading required package: methods real 0m0.419s user 0m0.363s sys 0m0.049s # Adding importFrom(Matrix, sparseMatrix) to NAMESPACE: ~$ time Rscript -e 'library(igraph); ls()' > /dev/null Loading required package: methods real 0m2.963s user 0m2.844s sys 0m0.115s This solution was fine with me, especially because other packages depending on igraph and using Matrix through igraph worked fine on the CRAN build servers, as igraph brought Matrix with it. (The build servers don't have recommended packages like Matrix available by default.) Recently, R CMD check does not allow me to list Matrix at Imports without importing something from it. This is understandable, because it is an inconsistency after all, but caused some headache for me. A 3s loading time for a package is IMHO much longer than ideal, especially that loading R itself is ten times faster. So I definitely don't want to import from Matrix right now. The solution I settled with was to include Matrix in 'Suggests', and the load it selectively, as before. Now some packages depending on igraph are failing on the CRAN build servers, which don't have Matrix installed for these packages. (Luckily they are probably not failing for users, because most users do have the recommended packages.) In summary, it would be great to speed up imports. Another solution would be some mechanism that allows me to import from a package as needed, not at the package loading time. Something like a delayed importFrom(). >>> >>> >>> That is what Matrix:: does. There is nothing like >>> enough here for us to tell why it would not suffice for >>> you. (If you want to import something occasionally and >>> then use it very many times, make a local copy.) >>> Just wanted to bring up this issue. Best, Gabor __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> >>> -- >>> Brian D. Ripley, rip...@stats.ox.ac.uk Professor of >>> Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >>> University of Oxford, Tel: +44 1865 272861 (self) 1 >>> South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, >>> UK Fax: +44 1865 272595 >>> >>> __ >>> R-devel@r-project.org mailing list >>> https
[Rd] unique(1:3,nmax=1) freezes R
Dear all, I was playing around with factor contrasts, and found the argument nmax on function factor. When using nmax=1, R froze completely, and I had to close it from task manager. After some debugging, I found that the problem is actually in unique-function, where the internal unique function is called: .Internal(unique(x, incomparables, fromLast, nmax)) More generally, it looks like unique(x,nmax=k) freezes R the length of x is larger than 2 and k=1, and when nmaxhttps://stat.ethz.ch/mailman/listinfo/r-devel