Re: [Rd] URL checks
One other failure mode: SSL certificates trusted by browsers that are not installed on the check machine, e.g. the "GEANT Vereniging" certificate from https://relational.fit.cvut.cz/ . K On 07.01.21 12:14, Kirill Müller via R-devel wrote: Hi The URL checks in R CMD check test all links in the README and vignettes for broken or redirected links. In many cases this improves documentation, I see problems with this approach which I have detailed below. I'm writing to this mailing list because I think the change needs to happen in R's check routines. I propose to introduce an "allow-list" for URLs, to reduce the burden on both CRAN and package maintainers. Comments are greatly appreciated. Best regards Kirill # Problems with the detection of broken/redirected URLs ## 301 should often be 307, how to change? Many web sites use a 301 redirection code that probably should be a 307. For example, https://www.oracle.com and https://www.oracle.com/ both redirect to https://www.oracle.com/index.html with a 301. I suspect the company still wants oracle.com to be recognized as the primary entry point of their web presence (to reserve the right to move the redirection to a different location later), I haven't checked with their PR department though. If that's true, the redirect probably should be a 307, which should be fixed by their IT department which I haven't contacted yet either. $ curl -i https://www.oracle.com HTTP/2 301 server: AkamaiGHost content-length: 0 location: https://www.oracle.com/index.html ... ## User agent detection twitter.com responds with a 400 error for requests without a user agent string hinting at an accepted browser. $ curl -i https://twitter.com/ HTTP/2 400 ... ...Please switch to a supported browser.. $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1 HTTP/2 200 # Impact While the latter problem *could* be fixed by supplying a browser-like user agent string, the former problem is virtually unfixable -- so many web sites should use 307 instead of 301 but don't. The above list is also incomplete -- think of unreliable links, HTTP links, other failure modes... This affects me as a package maintainer, I have the choice to either change the links to incorrect versions, or remove them altogether. I can also choose to explain each broken link to CRAN, this subjects the team to undue burden I think. Submitting a package with NOTEs delays the release for a package which I must release very soon to avoid having it pulled from CRAN, I'd rather not risk that -- hence I need to remove the link and put it back later. I'm aware of https://github.com/r-lib/urlchecker, this alleviates the problem but ultimately doesn't solve it. # Proposed solution ## Allow-list A file inst/URL that lists all URLs where failures are allowed -- possibly with a list of the HTTP codes accepted for that link. Example: https://oracle.com/ 301 https://twitter.com/drob/status/1224851726068527106 400 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] URL checks
Hi The URL checks in R CMD check test all links in the README and vignettes for broken or redirected links. In many cases this improves documentation, I see problems with this approach which I have detailed below. I'm writing to this mailing list because I think the change needs to happen in R's check routines. I propose to introduce an "allow-list" for URLs, to reduce the burden on both CRAN and package maintainers. Comments are greatly appreciated. Best regards Kirill # Problems with the detection of broken/redirected URLs ## 301 should often be 307, how to change? Many web sites use a 301 redirection code that probably should be a 307. For example, https://www.oracle.com and https://www.oracle.com/ both redirect to https://www.oracle.com/index.html with a 301. I suspect the company still wants oracle.com to be recognized as the primary entry point of their web presence (to reserve the right to move the redirection to a different location later), I haven't checked with their PR department though. If that's true, the redirect probably should be a 307, which should be fixed by their IT department which I haven't contacted yet either. $ curl -i https://www.oracle.com HTTP/2 301 server: AkamaiGHost content-length: 0 location: https://www.oracle.com/index.html ... ## User agent detection twitter.com responds with a 400 error for requests without a user agent string hinting at an accepted browser. $ curl -i https://twitter.com/ HTTP/2 400 ... ...Please switch to a supported browser.. $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1 HTTP/2 200 # Impact While the latter problem *could* be fixed by supplying a browser-like user agent string, the former problem is virtually unfixable -- so many web sites should use 307 instead of 301 but don't. The above list is also incomplete -- think of unreliable links, HTTP links, other failure modes... This affects me as a package maintainer, I have the choice to either change the links to incorrect versions, or remove them altogether. I can also choose to explain each broken link to CRAN, this subjects the team to undue burden I think. Submitting a package with NOTEs delays the release for a package which I must release very soon to avoid having it pulled from CRAN, I'd rather not risk that -- hence I need to remove the link and put it back later. I'm aware of https://github.com/r-lib/urlchecker, this alleviates the problem but ultimately doesn't solve it. # Proposed solution ## Allow-list A file inst/URL that lists all URLs where failures are allowed -- possibly with a list of the HTTP codes accepted for that link. Example: https://oracle.com/ 301 https://twitter.com/drob/status/1224851726068527106 400 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Profiling: attributing costs to place of invocation (instead of place of evaluation)?
Hi Consider the following example: f <- function(expr) g(expr) g <- function(expr) { h(expr) } h <- function(expr) { expr # evaluation happens here i(expr) } i <- function(expr) { expr # already evaluated, no costs here invisible() } rprof <- tempfile() Rprof(rprof) f(replicate(1e2, sample.int(1e4))) Rprof(NULL) cat(readLines(rprof), sep = "\n") #> sample.interval=2 #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "h" "g" "f" The evaluation of the slow replicate() call is deferred to the execution of h(), but there's no replicate() call in h's definition. This makes parsing the profile much more difficult than necessary. I have pasted an experimental patch below (off of 3.6.2) that produces the following output: cat(readLines(rprof), sep = "\n") #> sample.interval=2 #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" #> "sample.int" "FUN" "lapply" "sapply" "replicate" "f" This attributes the cost to the replicate() call to f(), where the call is actually defined. From my experience, this will give a much better understanding of the actual costs of each part of the code. The SIGPROF handler looks at sysparent and cloenv before deciding if an element of the call stack is to be included in the profile. Is there interest in integrating a variant of this patch, perhaps with an optional argument to Rprof()? Thanks! Best regards Kirill Index: src/main/eval.c === --- src/main/eval.c (revision 77857) +++ src/main/eval.c (working copy) @@ -218,7 +218,10 @@ if (R_Line_Profiling) lineprof(buf, R_getCurrentSrcref()); + SEXP sysparent = NULL; + for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) { + if (sysparent != NULL && cptr->cloenv != sysparent && cptr->sysparent != sysparent) continue; if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN)) && TYPEOF(cptr->call) == LANGSXP) { SEXP fun = CAR(cptr->call); @@ -292,6 +295,8 @@ else lineprof(buf, cptr->srcref); } + + sysparent = cptr->sysparent; } } } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Check length of logical vector also for operands of || and &&?
Hi everyone The following behavior (in R 3.6.1 and R-devel r77040) caught me by surprise today: truthy <- c(TRUE, FALSE) falsy <- c(FALSE, TRUE, FALSE) if (truthy) "check" #> Warning in if (truthy) "check": the condition has length > 1 and only the #> first element will be used #> [1] "check" if (falsy) "check" #> Warning in if (falsy) "check": the condition has length > 1 and only the #> first element will be used if (FALSE || truthy) "check" #> [1] "check" if (FALSE || falsy) "check" if (truthy || FALSE) "check" #> [1] "check" if (falsy || FALSE) "check" The || operator gobbles the warning about a length > 1 vector. I wonder if the existing checks for length 1 can be extended to the operands of the || and && operators. Thanks (and apologies if this has been raised before). Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [R-pkg-devel] active bindings in package namespace
Dear Jack This doesn't answer your question, but I would advise against this design. - Users do not expect side effects (such as network access) from accessing a symbol. - A function gives you much more flexibility to change the interface later on. (Arguments for fetching the data, tokens for API access, ...) - You already encountered a few quirks that make this an "interesting" problem. A function call only needs a pair of parentheses. Best regards Kirill On 23.03.19 16:50, Jack O. Wasey wrote: Dear all, I am developing a package which is a front for various online data (icd.data https://github.com/jackwasey/icd.data/ ). The current CRAN version just has lazy-loaded data, but now the package encompasses far more current and historic ICD codes from different countries, these can't be included in the CRAN package even with maximal compression. Other authors have solved this using functions to get the data, with or without a local cache of the retrieved data. No CRAN or other packages I have found after extensive searching use the attractive active binding feature of R. The goal is simple: for the user to refer to the data by its symbol, e.g., 'icd10fr2019', or 'icd.data::icd10fr2019', and it will be downloaded and parsed transparently (if the user has already granted permission, or after prompt if they haven't). The bindings are set using commands alongside the function definitions in R/*.R .E.g. makeActiveBinding("icd10cm_latest", .icd10cm_latest_binding, environment()) lockBinding("icd10cm_latest", environment()) For non-interactive use, CI and CRAN tests, no data should be downloaded, and no cache directory set up without user consent. For interactive use, I ask permission to create a local data cache before downloading data. This works fine... until R CMD check. The following steps seems to 'get' or 'source' everything from the package namespace, which results in triggering the active bindings, and this fails if I am unable to get consent to download data, and want to 'stop' on this error condition. - checking dependencies in R code - checking S3 generic/method consistency - checking foreign function calls - checking R code for possible problems Debugging CI-specific binding bugs is a nightmare because these occur in different R sessions initiated by R CMD check. There may be legitimate reasons to evaluate everything in the namespace, but I've no idea what they are. Incidentally, Rstudio also does 'mget' on the whole package namespace and triggers bindings during autocomplete. https://github.com/rstudio/rstudio/issues/4414 Is this something I should raise as an issue with R? Or does anyone have any idea of a sensible approach to this. Currently I have a set of workarounds, but this complicates the code, and has taken an awful lot of time. Does anyone know of any CRAN package which has active bindings in the package namespace? Any ideas appreciated. Jack Wasey __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [Rd] bias issue in sample() (PR 17494)
Ralf I don't doubt this is expected with the current implementation, I doubt the implementation is desirable. Suggesting to turn this to pbirthday(1e6, classes = 2^53) ## [1] 5.550956e-05 (which is still non-zero, but much less likely to cause confusion.) Best regards Kirill On 26.02.19 10:18, Ralf Stubner wrote: Kirill, I think some level of collision is actually expected! R uses a 32bit MT that can produce 2^32 different doubles. The probability for a collision within a million draws is pbirthday(1e6, classes = 2^32) [1] 1 Greetings Ralf On 26.02.19 07:06, Kirill Müller wrote: Gabe As mentioned on Twitter, I think the following behavior should be fixed as part of the upcoming changes: R.version.string ## [1] "R Under development (unstable) (2019-02-25 r76160)" .Machine$double.digits ## [1] 53 set.seed(123) RNGkind() ## [1] "Mersenne-Twister" "Inversion" "Rejection" length(table(runif(1e6))) ## [1] 999863 I don't expect any collisions when using Mersenne-Twister to generate a million floating point values. I'm not sure what causes this behavior, but it's documented in ?Random: "Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)" The "Wichman-Hill" bit is interesting: RNGkind("Wichmann-Hill") length(table(runif(1e6))) ## [1] 100 length(table(runif(1e6))) ## [1] 100 Mersenne-Twister has a much much larger periodicity than Wichmann-Hill, it would be great to see the above behavior also for Mersenne-Twister. Thanks for considering. Best regards Kirill On 20.02.19 08:01, Gabriel Becker wrote: Luke, I'm happy to help with this. Its great to see this get tackled (I've cc'ed Kelli Ottoboni who helped flag this issue). I can prepare a patch for the RNGkind related stuff and the doc update. As for ???, what are your (and others') thoughts about the possibility of a) a reproducibility API which takes either an R version (or maybe alternatively a date) and sets the RNGkind to the default for that version/date, and/or b) that sessionInfo be modified to capture (and display) the RNGkind in effect. Best, ~G On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke wrote: Before the next release we really should to sort out the bias issue in sample() reported by Ottoboni and Stark in https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and filed aa a bug report by Duncan Murdoch at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494. Here are two examples of bad behavior through current R-devel: set.seed(123) m <- (2/5) * 2^32 x <- sample(m, 100, replace = TRUE) table(x %% 2, x > m / 2) ## ## FALSE TRUE ## 0 300620 198792 ## 1 200196 300392 table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2) ## ## 0 1 ## 429054 570946 I committed a modification to R_unif_index to address this by generating random bits (blocks of 16) and rejection sampling, but for now this is only enabled if the environment variable R_NEW_SAMPLE is set before the first call. Some things still needed: - someone to look over the change and see if there are any issues - adjustment of RNGkind to allowing the old behavior to be selected - make the new behavior the default - adjust documentation - ??? Unfortunately I don't have enough free cycles to do this, but I can help if someone else can take the lead. There are two other places I found that might suffer from the same issue, in walker_ProbSampleReplace (pointed out bu O & S) and in src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I have done that for walker_ProbSampleReplace, but the wilcox change might need adjusting to support the standalone math library and I don't feel confident enough I'd get that right. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] bias issue in sample() (PR 17494)
Gabe As mentioned on Twitter, I think the following behavior should be fixed as part of the upcoming changes: R.version.string ## [1] "R Under development (unstable) (2019-02-25 r76160)" .Machine$double.digits ## [1] 53 set.seed(123) RNGkind() ## [1] "Mersenne-Twister" "Inversion" "Rejection" length(table(runif(1e6))) ## [1] 999863 I don't expect any collisions when using Mersenne-Twister to generate a million floating point values. I'm not sure what causes this behavior, but it's documented in ?Random: "Do not rely on randomness of low-order bits from RNGs. Most of the supplied uniform generators return 32-bit integer values that are converted to doubles, so they take at most 2^32 distinct values and long runs will return duplicated values (Wichmann-Hill is the exception, and all give at least 30 varying bits.)" The "Wichman-Hill" bit is interesting: RNGkind("Wichmann-Hill") length(table(runif(1e6))) ## [1] 100 length(table(runif(1e6))) ## [1] 100 Mersenne-Twister has a much much larger periodicity than Wichmann-Hill, it would be great to see the above behavior also for Mersenne-Twister. Thanks for considering. Best regards Kirill On 20.02.19 08:01, Gabriel Becker wrote: Luke, I'm happy to help with this. Its great to see this get tackled (I've cc'ed Kelli Ottoboni who helped flag this issue). I can prepare a patch for the RNGkind related stuff and the doc update. As for ???, what are your (and others') thoughts about the possibility of a) a reproducibility API which takes either an R version (or maybe alternatively a date) and sets the RNGkind to the default for that version/date, and/or b) that sessionInfo be modified to capture (and display) the RNGkind in effect. Best, ~G On Tue, Feb 19, 2019 at 11:52 AM Tierney, Luke wrote: Before the next release we really should to sort out the bias issue in sample() reported by Ottoboni and Stark in https://www.stat.berkeley.edu/~stark/Preprints/r-random-issues.pdf and filed aa a bug report by Duncan Murdoch at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17494. Here are two examples of bad behavior through current R-devel: set.seed(123) m <- (2/5) * 2^32 x <- sample(m, 100, replace = TRUE) table(x %% 2, x > m / 2) ## ##FALSE TRUE ## 0 300620 198792 ## 1 200196 300392 table(sample(2/7 * 2^32, 100, replace = TRUE) %% 2) ## ## 0 1 ## 429054 570946 I committed a modification to R_unif_index to address this by generating random bits (blocks of 16) and rejection sampling, but for now this is only enabled if the environment variable R_NEW_SAMPLE is set before the first call. Some things still needed: - someone to look over the change and see if there are any issues - adjustment of RNGkind to allowing the old behavior to be selected - make the new behavior the default - adjust documentation - ??? Unfortunately I don't have enough free cycles to do this, but I can help if someone else can take the lead. There are two other places I found that might suffer from the same issue, in walker_ProbSampleReplace (pointed out bu O & S) and in src/nmath/wilcox.c. Both can be addressed by using R_unif_index. I have done that for walker_ProbSampleReplace, but the wilcox change might need adjusting to support the standalone math library and I don't feel confident enough I'd get that right. Best, luke -- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Dots are not fixed by make.names()
Hi It seems that names of the form "..#" and "..." are not fixed by make.names(), even though they are reserved words. The documentation reads: > [...] Names such as ".2way" are not valid, and neither are the reserved words. > Reserved words in R: [...] ... and ..1, ..2 etc, which are used to refer to arguments passed down from a calling function, see ?... . I have pasted a reproducible example below. I'd like to suggest to convert these to "...#" and "", respectively. Happy to contribute PR. Best regards Kirill make.names(c("..1", "..13", "...")) #> [1] "..1" "..13" "..." `..1` <- 1 `..13` <- 13 `...` <- "dots" mget(c("..1", "..13", "...")) #> $..1 #> [1] 1 #> #> $..13 #> [1] 13 #> #> $... #> [1] "dots" `..1` #> Error in eval(expr, envir, enclos): the ... list does not contain any elements `..13` #> Error in eval(expr, envir, enclos): the ... list does not contain 13 elements `...` #> Error in eval(expr, envir, enclos): '...' used in an incorrect context __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts
On 09.06.2017 13:23, Martin Maechler wrote: Kirill Müller <kirill.muel...@ivt.baug.ethz.ch> on Thu, 8 Jun 2017 12:55:26 +0200 writes: > On 06.06.2017 22:14, Kirill Müller wrote: >> >> >> On 06.06.2017 10:07, Martin Maechler wrote: >>>>>>>> Kirill Müller <kirill.muel...@ivt.baug.ethz.ch> on >>>>>>>> Mon, 5 Jun 2017 17:30:20 +0200 writes: >>> > Hi I've noted a minor inconsistency in the >>> documentation: > Current R-exts reads >>> >>> > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), >>> ); >>> >>> > but I believe it has to be >>> >>> > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), >>> ); >>> >>> > because PROTECT_WITH_INDEX() returns void. >>> >>> Yes indeed, thank you Kirill! >>> >>> note that the same is true for its partner >>> function|macro REPROTECT() >>> >>> However, as PROTECT() is used a gazillion times and >>> PROTECT_WITH_INDEX() is used about 100 x less, and >>> PROTECT() *does* return the SEXP, I do wonder why >>> PROTECT_WITH_INDEX() and REPROTECT() could not behave >>> the same as PROTECT() (a view at the source code seems >>> to suggest a change to be trivial). I assume usual >>> compiler optimization would not create less efficient >>> code in case the idiom PROTECT_WITH_INDEX(s = ...) is >>> used, i.e., in case the return value is not used ? >>> >>> Maybe this is mainly a matter of taste, but I find the >>> use of >>> >>> SEXP s = PROTECT(); >>> >>> quite nice in typical cases where this appears early in >>> a function. Also for that reason -- but even more for >>> consistency -- it would also be nice if >>> PROTECT_WITH_INDEX() behaved the same. >> Thanks, Martin, this sounds reasonable. I've put together >> a patch for review [1], a diff for applying to SVN (via >> `cat | patch -p1`) would be [2]. The code compiles on my >> system. >> >> >> -Kirill >> >> >> [1] https://github.com/krlmlr/r-source/pull/5/files >> >> [2] >> https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff > I forgot to mention that this patch applies cleanly to r72768. Thank you, Kirill. I've been a bit busy so did not get to reply more quickly. Just to be clear: I did not ask for a patch but was _asking_ / requesting comments about the possibility to do that. In the mean time, within the core team, the opinions were mixed and costs of the change (recompilations needed, C source level check tools would need updating / depend on R versions) are clearly non-zero. As a consquence, we will fix the documentation, rather than changing the API. Thanks for looking into this. The patch was more a proof of concept, I don't mind throwing it away. -Kirill Martin __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts
On 06.06.2017 22:14, Kirill Müller wrote: On 06.06.2017 10:07, Martin Maechler wrote: Kirill Müller <kirill.muel...@ivt.baug.ethz.ch> on Mon, 5 Jun 2017 17:30:20 +0200 writes: > Hi I've noted a minor inconsistency in the documentation: > Current R-exts reads > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), ); > but I believe it has to be > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), ); > because PROTECT_WITH_INDEX() returns void. Yes indeed, thank you Kirill! note that the same is true for its partner function|macro REPROTECT() However, as PROTECT() is used a gazillion times and PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT() *does* return the SEXP, I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not behave the same as PROTECT() (a view at the source code seems to suggest a change to be trivial). I assume usual compiler optimization would not create less efficient code in case the idiom PROTECT_WITH_INDEX(s = ...) is used, i.e., in case the return value is not used ? Maybe this is mainly a matter of taste, but I find the use of SEXP s = PROTECT(); quite nice in typical cases where this appears early in a function. Also for that reason -- but even more for consistency -- it would also be nice if PROTECT_WITH_INDEX() behaved the same. Thanks, Martin, this sounds reasonable. I've put together a patch for review [1], a diff for applying to SVN (via `cat | patch -p1`) would be [2]. The code compiles on my system. -Kirill [1] https://github.com/krlmlr/r-source/pull/5/files [2] https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff I forgot to mention that this patch applies cleanly to r72768. -Kirill Martin > Best regards > Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Usage of PROTECT_WITH_INDEX in R-exts
On 06.06.2017 10:07, Martin Maechler wrote: Kirill Müller <kirill.muel...@ivt.baug.ethz.ch> on Mon, 5 Jun 2017 17:30:20 +0200 writes: > Hi I've noted a minor inconsistency in the documentation: > Current R-exts reads > s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), ); > but I believe it has to be > PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), ); > because PROTECT_WITH_INDEX() returns void. Yes indeed, thank you Kirill! note that the same is true for its partner function|macro REPROTECT() However, as PROTECT() is used a gazillion times and PROTECT_WITH_INDEX() is used about 100 x less, and PROTECT() *does* return the SEXP, I do wonder why PROTECT_WITH_INDEX() and REPROTECT() could not behave the same as PROTECT() (a view at the source code seems to suggest a change to be trivial). I assume usual compiler optimization would not create less efficient code in case the idiom PROTECT_WITH_INDEX(s = ...) is used, i.e., in case the return value is not used ? Maybe this is mainly a matter of taste, but I find the use of SEXP s = PROTECT(); quite nice in typical cases where this appears early in a function. Also for that reason -- but even more for consistency -- it would also be nice if PROTECT_WITH_INDEX() behaved the same. Thanks, Martin, this sounds reasonable. I've put together a patch for review [1], a diff for applying to SVN (via `cat | patch -p1`) would be [2]. The code compiles on my system. -Kirill [1] https://github.com/krlmlr/r-source/pull/5/files [2] https://patch-diff.githubusercontent.com/raw/krlmlr/r-source/pull/5.diff Martin > Best regards > Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Usage of PROTECT_WITH_INDEX in R-exts
Hi I've noted a minor inconsistency in the documentation: Current R-exts reads s = PROTECT_WITH_INDEX(eval(OS->R_fcall, OS->R_env), ); but I believe it has to be PROTECT_WITH_INDEX(s = eval(OS->R_fcall, OS->R_env), ); because PROTECT_WITH_INDEX() returns void. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] source(), parse(), and foreign UTF-8 characters
On 09.05.2017 13:19, Duncan Murdoch wrote: On 09/05/2017 3:42 AM, Kirill Müller wrote: Hi I'm having trouble sourcing or parsing a UTF-8 file that contains characters that are not representable in the current locale ("foreign characters") on Windows. The source() function stops with an error, the parse() function reencodes all foreign characters using the <U+> notation. I have added a reproducible example below the message. This seems well within the bounds of documented behavior, although the documentation to source() could mention that the file can't contain foreign characters. Still, I'd prefer if UTF-8 "just worked" in R, and I'm willing to invest substantial time to help with that. Before starting to write a detailed proposal, I feel that I need a better understanding of the problem, and I'm grateful for any feedback you might have. I have looked into character encodings in the context of the dplyr package, and I have observed the following behavior: - Strings are treated preferentially in the native encoding - Only upon specific request (via translateCharUTF8() or enc2utf8() or ...), they are translated to UTF-8 and marked as such - On UTF-8 systems, strings are never marked as UTF-8 - ASCII strings are marked as ASCII internally, but this information doesn't seem to be available, e.g., Encoding() returns "unknown" for such strings - Most functions in R are encoding-agnostic: they work the same regardless if they receive a native or UTF-8 encoded string if they are properly tagged - One important difference are symbols, which must be in the native encoding (and are always converted to native encoding, using <U+> escapes) - I/O is centered around the native encoding, e.g., writeLines() always reencodes to the native encoding - There is the "bytes" encoding which avoids reencoding. I haven't looked into serialization or plot devices yet. The conclusion to the "UTF-8 manifesto" [1] suggests "... to use UTF-8 narrow strings everywhere and convert them back and forth when using platform APIs that don’t support UTF-8 ...". (It is written in the context of the UTF-16 encoding used internally on Windows, but seems to apply just the same here for the native encoding.) I think that Unicode support in R could be greatly improved if we follow these guidelines. This seems to mean: - Convert strings to UTF-8 as soon as possible, and mark them as such (also on systems where UTF-8 is the native encoding) - Translate to native only upon specific request, e.g., in calls to API functions or perhaps for .C() - Use UTF-8 for symbols - Avoid the forced round-trip to the native encoding in I/O functions and for parsing (but still read/write native by default) - Carefully look into serialization and plot devices - Add helper functions that simplify mundane tasks such as reading/writing a UTF-8 encoded file Those are good long term goals, though I think the effort is easier than you think. Rather than attempting to do it all at once, you should look for ways to do it gradually and submit self-contained patches. In many cases it doesn't matter if strings are left in the local encoding, because the encoding doesn't matter. The problems arise when UTF-8 strings are converted to the local encoding before it's necessary, because that's a lossy conversion. So a simple way to proceed is to identify where these conversions occur, and remove them one-by-one. Thanks, Duncan, this looks like a good start indeed. Did you really mean to say "the effort is easier than I think"? It would be great if I had overestimated the effort, I seldom do. That said, I'd be grateful if you could review/integrate/... future patches of mine towards parsing and sourcing of UTF-8 files with foreign characters, this problem seems to be self-contained (but perhaps not that easy). I still think symbols should be in UTF-8, and this change might be difficult to split into smaller changes, especially if taking into account serialization and other potential pitfalls. Currently I'm working on bug 16098, "Windows doesn't handle high Unicode code points". It doesn't require many changes at all to handle input of those characters; all the remaining issues are avoiding the problems you identify above. The origin of the issue is the fact that in Windows wchar_t is only 16 bits (not big enough to hold all Unicode code points). As far as I know, Windows has no standard type to hold a Unicode code point, most of the run-time functions still use the 16 bit wchar_t. I didn't mention non-BMP characters, they are an important issue as well. I think once that bug is dealt with, 90+% of the remaining issues could be solved by avoiding translateChar on Windows. This could be done by avoiding it everywhere, or by acting as though Windows is running in a UTF-8 locale until you actually need to write to a file. Other s
[Rd] source(), parse(), and foreign UTF-8 characters
Hi I'm having trouble sourcing or parsing a UTF-8 file that contains characters that are not representable in the current locale ("foreign characters") on Windows. The source() function stops with an error, the parse() function reencodes all foreign characters using the notation. I have added a reproducible example below the message. This seems well within the bounds of documented behavior, although the documentation to source() could mention that the file can't contain foreign characters. Still, I'd prefer if UTF-8 "just worked" in R, and I'm willing to invest substantial time to help with that. Before starting to write a detailed proposal, I feel that I need a better understanding of the problem, and I'm grateful for any feedback you might have. I have looked into character encodings in the context of the dplyr package, and I have observed the following behavior: - Strings are treated preferentially in the native encoding - Only upon specific request (via translateCharUTF8() or enc2utf8() or ...), they are translated to UTF-8 and marked as such - On UTF-8 systems, strings are never marked as UTF-8 - ASCII strings are marked as ASCII internally, but this information doesn't seem to be available, e.g., Encoding() returns "unknown" for such strings - Most functions in R are encoding-agnostic: they work the same regardless if they receive a native or UTF-8 encoded string if they are properly tagged - One important difference are symbols, which must be in the native encoding (and are always converted to native encoding, using escapes) - I/O is centered around the native encoding, e.g., writeLines() always reencodes to the native encoding - There is the "bytes" encoding which avoids reencoding. I haven't looked into serialization or plot devices yet. The conclusion to the "UTF-8 manifesto" [1] suggests "... to use UTF-8 narrow strings everywhere and convert them back and forth when using platform APIs that don’t support UTF-8 ...". (It is written in the context of the UTF-16 encoding used internally on Windows, but seems to apply just the same here for the native encoding.) I think that Unicode support in R could be greatly improved if we follow these guidelines. This seems to mean: - Convert strings to UTF-8 as soon as possible, and mark them as such (also on systems where UTF-8 is the native encoding) - Translate to native only upon specific request, e.g., in calls to API functions or perhaps for .C() - Use UTF-8 for symbols - Avoid the forced round-trip to the native encoding in I/O functions and for parsing (but still read/write native by default) - Carefully look into serialization and plot devices - Add helper functions that simplify mundane tasks such as reading/writing a UTF-8 encoded file I'm sure I've missed many potential pitfalls, your input is greatly appreciated. Thanks for your attention. Further ressources: A write-up by Prof. Ripley [2], a section in R-ints [3], a blog post by Ista Zahn [4], a StackOverflow search [5]. Best regards Kirill [1] http://utf8everywhere.org/#conclusions [2] https://developer.r-project.org/Encodings_and_R.html [3] https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Encodings-for-CHARSXPs [3] http://people.fas.harvard.edu/~izahn/posts/reading-data-with-non-native-encoding-in-r/ [4] http://stackoverflow.com/search?tab=votes=%5br%5d%20encoding%20windows%20is%3aquestion # Use one of the following: id <- "Gl\u00fcck" id <- "\u5e78\u798f" id <- "\u0441\u0447\u0430\u0441\u0442\u044c\u0435" id <- "\ud589\ubcf5" file_contents <- paste0('"', id, '"') Encoding(file_contents) raw_file_contents <- charToRaw(file_contents) path <- tempfile(fileext = ".R") writeBin(raw_file_contents, path) file.size(path) length(raw_file_contents) # Escapes the string parse(text = file_contents) # Throws an error print(source(path, encoding = "UTF-8")) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Upgrading a package to which other packages are LinkingTo
Thanks for discussing this. On 16.12.2016 17:19, Dirk Eddelbuettel wrote: On 16 December 2016 at 11:00, Duncan Murdoch wrote: | On 16/12/2016 10:40 AM, Dirk Eddelbuettel wrote: | > On 16 December 2016 at 10:14, Duncan Murdoch wrote: | > | On 16/12/2016 8:37 AM, Dirk Eddelbuettel wrote: | > | > | > | > On 16 December 2016 at 08:20, Duncan Murdoch wrote: | > | > | Perhaps the solution is to recommend that packages which export their | > | > | C-level entry points either guarantee them not to change or offer | > | > | (require?) version checks by user code. So dplyr should start out by | > | > | saying "I'm using Rcpp interface 0.12.8". If Rcpp has a new version | > | > | with a compatible interface, it replies "that's fine". If Rcpp has | > | > | changed its interface, it says "Sorry, I don't support that any more." Sounds good to me, I was considering something similar. dplyr can simply query Rcpp's current version in .onLoad(), compare it to the version at installation time and act accordingly. | > | > | > | > We try. But it's hard, and I'd argue, likely impossible. | > | > | > | > For example I even added a "frozen" package [1] in the sources / unit tests | > | > to test for just this. In practice you just cannot hit every possible access | > | > point of the (rich, in our case) API so the tests pass too often. | > | > | > | > Which is why we relentlessly test against reverse-depends to _at least ensure | > | > buildability_ from our releases. | > | > I meant to also add: "... against a large corpus of other packages." | > The intent is to empirically answer this. | > | > | > As for seamless binary upgrade, I don't think in can work in practice. Ask | > | > Uwe one day we he rebuilds everything every time on Windows. And for what it | > | > is worth, we essentially do the same in Debian. | > | > | > | > Sometimes you just need to rebuild. That may be the price of admission for | > | > using the convenience of rich C++ interfaces. | > | > | > | | > | Okay, so would you say that Kirill's suggestion is not overkill? Every | > | time package B uses LinkingTo: A, R should assume it needs to rebuild B | > | when A is updated? | > | > Based on my experience is a "halting problem" -- i.e. cannot know ex ante. | > | > So "every time" would be overkill to me. Sometimes you know you must | > recompile (but try to be very prudent with public-facing API). Many times | > you do not. It is hard to pin down. I'd argue that recompiling/reinstalling B is cheap enough and the safest option. So unless there is a risk, why not simply do it every time A updates? This could be implemented with a perhaps small change in R: When installing A, treat all packages that have A in both LinkingTo and Imports as dependencies that need to be reinstalled. -Kirill | > | > At work we have a bunch of servers with Rcpp and many packages against them | > (installed system-wide for all users). We _very really_ needs rebuild. Edit: "We _very rarely_ need rebuilds" is what was meant there. | So that comes back to my suggestion: you should provide a way for a | dependent package to ask if your API has changed. If you say it hasn't, | the package is fine. If you say it has, the package should abort, | telling the user they need to reinstall it. (Because it's a hard | question to answer, you might get it wrong and say it's fine when it's | not. But that's easy to fix: just make a new release that does require Sure. We have always increased the higher-order version number when that is needed. One problem with your proposal is that the testing code may run after the package load, and in the case where it matters ... that very code may not get reached because the package didn't load. Dirk __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] withAutoprint({ .... }) ?
On 25.09.2016 18:29, Martin Maechler wrote: I'm now committing my version (including (somewhat incomplete) documentation, so you (all) can look at it and try / test it further. Thanks, that's awesome. Is `withAutoprint()` recursive? How about calling the new function in `example()` (instead of `source()` as it is now) so that examples are always rendered in auto-print mode? That may add some extra output to examples (which can be removed easily), but solve the original problem in a painless way. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] withAutoprint({ .... }) ?
On 02.09.2016 14:38, Duncan Murdoch wrote: On 02/09/2016 7:56 AM, Martin Maechler wrote: On R-help, with subject '[R] source() does not include added code' Joshua Ulrichon Wed, 31 Aug 2016 10:35:01 -0500 writes: > I have quantstrat installed and it works fine for me. If you're > asking why the output of t(tradeStats('macross')) isn't being printed, > that's because of what's described in the first paragraph in the > *Details* section of help("source"): > Note that running code via ‘source’ differs in a few respects from > entering it at the R command line. Since expressions are not > executed at the top level, auto-printing is not done. So you will > need to include explicit ‘print’ calls for things you want to be > printed (and remember that this includes plotting by ‘lattice’, > FAQ Q7.22). > So you need: > print(t(tradeStats('macross'))) > if you want the output printed to the console. indeed, and "of course"" ;-) As my subject indicates, this is another case, where it would be very convenient to have a function withAutoprint() so the OP could have (hopefully) have used withAutoprint(source(..)) though that would have been equivalent to the already nicely existing source(.., print.eval = TRUE) which works via the withVisible(.) utility that returns for each 'expression' if it would auto print or not, and then does print (or not) accordingly. My own use cases for such a withAutoprint({...}) are demos and examples, sometimes even package tests which I want to print: Assume I have a nice demo / example on a help page/ ... foo(..) (z <- bar(..)) summary(z) where I carefully do print parts (and don't others), and suddenly I find I want to run that part of the demo / example / test only in some circumstances, e.g., only when interactive, but not in BATCH, or only if it is me, the package maintainer, if( identical(Sys.getenv("USER"), "maechler") ) { foo(..) (z <- bar(..)) summary(z) } Now all the auto-printing is gone, and 1) I have to find out which of these function calls do autoprint and wrap a print(..) around these, and 2) the result is quite ugly (for an example on a help page etc.) What I would like in a future R, is to be able to simply wrap the "{ .. } above with an 'withAutoprint(.) : if( identical(Sys.getenv("USER"), "maechler") ) withAutoprint({ foo(..) (z <- bar(..)) summary(z) }) Conceptually such a function could be written similar to source() with an R level for loop, treating each expression separately, calling eval(.) etc. That may cost too much performnace, ... still to have it would be better than not having the possibility. If you read so far, you'd probably agree that such a function could be a nice asset in R, notably if it was possible to do this on the fast C level of R's main REPL. Have any of you looked into how this could be provided in R ? If you know the source a little, you will remember that there's the global variable R_Visible which is crucial here. The problem with that is that it *is* global, and only available as that; that the auto-printing "concept" is so linked to "toplevel context" and that is not easy, and AFAIK not so much centralized in one place in the source. Consequently, all kind of (very) low level functions manipulate R_Visible temporarily and so a C level implementation of withAutoprint() may need considerable more changes than just setting R_Visible to TRUE in one place. Have any efforts / experiments already happened towards providing such functionality ? I don't think the performance cost would matter. If you're printing something, you're already slow. So doing this at the R level would make most sense to me --- that's how Sweave and source and knitr do it, so it can't be that bad. Duncan Murdoch A C-level implementation would bring the benefit of a lean traceback() in case of an error. I suspect eval() could be enhanced to auto-print. By the same token it would be extremely helpful to have a C-level implementation of local() which wouldn't litter the stack trace. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R process killed when allocating too large matrix (Mac OS X)
On 12.05.2016 09:51, Martin Maechler wrote: > My ulimit package exposes this API ([1], should finally submit it to > CRAN); unfortunately this very API seems to be unsupported on OS X > [2,3]. Last time I looked into it, neither of the documented settings > achieved the desired effect. > -Kirill > [1] http://krlmlr.github.io/ulimit > [2] > http://stackoverflow.com/questions/3274385/how-to-limit-memory-of-a-os-x-program-ulimit-v-neither-m-are-working > [3] > https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html ... In an ideal word, some of us, from R core, Jeroen, Kyrill, , maintainer("microbenchmark>, ... would sit together and devise an R function interface (based on low level platform specific interfaces, specifically for at least Linux/POSIX-compliant, Mac, and Windows) which would allow something like your rlimit(..) calls below. We'd really need something to work on all platforms ideally, to be used by R package maintainers and possibly even better by R itself at startup, setting a reasonable memory cap - which the user could raise even to +Inf (or lower even more). I haven't found a Windows API that allows limiting the address space, only one that limits the working set size; it seems likely that this is the best we can get on OS X, too, but then my experience with OS X is very limited. mallinfo() is used on Windows and seems to be available on Linux, too, but not on OS X. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R process killed when allocating too large matrix (Mac OS X)
My ulimit package exposes this API ([1], should finally submit it to CRAN); unfortunately this very API seems to be unsupported on OS X [2,3]. Last time I looked into it, neither of the documented settings achieved the desired effect. -Kirill [1] http://krlmlr.github.io/ulimit [2] http://stackoverflow.com/questions/3274385/how-to-limit-memory-of-a-os-x-program-ulimit-v-neither-m-are-working [3] https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getrlimit.2.html On 10.05.2016 01:08, Jeroen Ooms wrote: On 05/05/2016 10:11, Uwe Ligges wrote: Actually this also happens under Linux and I had my R processes killed more than once (and much worse also other processes so that we had to reboot a server, essentially). I found that setting RLIMIT_AS [1] works very well on Linux. But this requires that you cap memory to some fixed value. library(RAppArmor) rlimit_as(1e9) rnorm(1e9) Error: cannot allocate vector of size 7.5 Gb The RAppArmor package has many other utilities to protect your server such from a mis-behaving process such as limiting cpu time (RLIMIT_CPU), fork bombs (RLIMIT_NPROC) and file sizes (RLIMIT_FSIZE). [1] http://linux.die.net/man/2/getrlimit __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings
Hi I think the following behavior is a regression from R 3.2.5: > match(iconv( c("\u00f8", "A"), from = "UTF8", to = "latin1" ), "\u00f8") [1] 1 NA > match(iconv( c("\u00f8"), from = "UTF8", to = "latin1" ), "\u00f8") [1] NA > match(iconv( c("\u00f8"), from = "UTF8", to = "latin1" ), "\u00f8", incomparables = NA) [1] 1 I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10. The specific behavior makes me think this is related to the following NEWS entry: match(x, table) is faster (sometimes by an order of magnitude) when x is of length one and incomparables is unchanged (PR#16491). Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment
Thanks for looking into it, your approach sounds good to me. See also R_has_methods_attached() (https://github.com/wch/r-source/blob/42ecf5f492a005f5398cbb4c9becd4aa5af9d05c/src/main/objects.c#L258-L265). I'm fine with Rscript not loading "methods", as long as everything works properly with "methods" loaded but not attached. -Kirill On 19.04.2016 04:10, Michael Lawrence wrote: Right, the methods package is not attached by default when running R with Rscript. We should probably remove that special case, as it mostly just leads to confusion, but that won't happen immediately. For now, the S4_extends() should probably throw an error when the methods namespace is not loaded. And the check should be changed to directly check whether R_MethodsNamespace has been set to something other than the default (R_GlobalEnv). Agreed? On Mon, Apr 18, 2016 at 4:35 PM, Kirill Müller <kirill.muel...@ivt.baug.ethz.ch> wrote: Scenario: An S3 method is declared for an S4 base class but called for an instance of a derived class. Steps to reproduce: Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())" Error in UseMethod("test", x) : no applicable method for 'test' applied to an object of class "lsyMatrix" Calls: 1: MatrixDispatchTest::test(Matrix::Matrix()) Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())" [1] "Hi" To me, it looks like a sanity check in line 655 of src/main/attrib.c is making wrong assumptions, but there might be other reasons. (https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656) Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment
Please omit "MatrixDispatchTest::" from the test scripts: Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; test(Matrix::Matrix())" Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; test(Matrix::Matrix())" -Kirill On 19.04.2016 01:35, Kirill Müller wrote: Scenario: An S3 method is declared for an S4 base class but called for an instance of a derived class. Steps to reproduce: > Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())" Error in UseMethod("test", x) : no applicable method for 'test' applied to an object of class "lsyMatrix" Calls: 1: MatrixDispatchTest::test(Matrix::Matrix()) > Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())" [1] "Hi" To me, it looks like a sanity check in line 655 of src/main/attrib.c is making wrong assumptions, but there might be other reasons. (https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656) Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] S3 dispatch for S4 subclasses only works if variable "extends" is accessible from global environment
Scenario: An S3 method is declared for an S4 base class but called for an instance of a derived class. Steps to reproduce: > Rscript -e "test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())" Error in UseMethod("test", x) : no applicable method for 'test' applied to an object of class "lsyMatrix" Calls: 1: MatrixDispatchTest::test(Matrix::Matrix()) > Rscript -e "extends <- 42; test <- function(x) UseMethod('test', x); test.Matrix <- function(x) 'Hi'; MatrixDispatchTest::test(Matrix::Matrix())" [1] "Hi" To me, it looks like a sanity check in line 655 of src/main/attrib.c is making wrong assumptions, but there might be other reasons. (https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/main/attrib.c#L655-L656) Same behavior in R 3.2.4, R 3.2.5 and R-devel r70420. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [R-pkg-devel] Scripts to generate data objects
The devtools::use_data_raw() function creates a "data-raw" directory for this purpose, and adds it to .Rbuildignore so that it's not included in the built package. Your scripts can then write the data to the proper place using devtools::use_data(). -Kirill On 30.03.2016 14:03, Iago Mosqueira wrote: Hello, What is the best way of keeping R scripts that are used to generate the data files in the data/ folder? These are not meant to be available to the user, but I would like to keep them in the package itself. Right now I am storing them inside data/, for example PKG/data/datasetone.R to create PKG/data/dataseton.RData, and then adding those R files to .Rbuildignore. Are there any other sensible ways of doing this? Thanks, Iago [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[Rd] DESCRIPTION file: Space after colon mandatory?
According to R-exts, DESCRIPTION is a DCF variant, and " Fields start with an ASCII name immediately followed by a colon: the value starts after the colon and a space." However, according to the linked https://www.debian.org/doc/debian-policy/ch-controlfields.html, horizontal space before and after a value are trimmed, this is also the behavior of read.dcf(). Is this an omission in the documentation, or is the space after the colon actually required? Thanks. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] getParseData() for installed packages
On 10.03.2016 16:05, Duncan Murdoch wrote: On 10/03/2016 9:53 AM, Kirill Müller wrote: On 10.03.2016 15:49, Duncan Murdoch wrote: I install using R CMD INSTALL ., and I have options(keep.source = TRUE, keep.source.pkgs = TRUE) in my .Rprofile . The srcrefs are all there, it's just that the parse data is not where I'd expect it to be. Okay, I see what you describe. I'm not going to have time to track this down for a while, so I'm going to post your message as a bug report, and hopefully will be able to get to it before 3.3.0. Thanks. A related note: Would it be possible to make available all of first_byte/last_byte/first_column/last_column in the parse data, for easier srcref reconstruction? -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] getParseData() for installed packages
On 10.03.2016 15:49, Duncan Murdoch wrote: On 10/03/2016 8:27 AM, Kirill Müller wrote: I can't seem to reliably obtain parse data via getParseData() for functions from installed packages. The parse data seems to be available only for the *last* file in the package. See [1] for a small example package with just two functions f and g in two files a.R and b.R. See [2] for a documented test run on installed package (Ubuntu 15.10, UTF-8 locale, R 3.2.3). Same behavior with r-devel (r70303). The parse data helps reliable coverage analysis [3]. Please advise. You don't say how you built the package. Parse data is omitted by default. Duncan Murdoch I install using R CMD INSTALL ., and I have options(keep.source = TRUE, keep.source.pkgs = TRUE) in my .Rprofile . The srcrefs are all there, it's just that the parse data is not where I'd expect it to be. -Kirill Best regards Kirill [1] https://github.com/krlmlr/covr.dummy [2] http://rpubs.com/krlmlr/getParseData [3] https://github.com/jimhester/covr/pull/154 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] getParseData() for installed packages
I can't seem to reliably obtain parse data via getParseData() for functions from installed packages. The parse data seems to be available only for the *last* file in the package. See [1] for a small example package with just two functions f and g in two files a.R and b.R. See [2] for a documented test run on installed package (Ubuntu 15.10, UTF-8 locale, R 3.2.3). Same behavior with r-devel (r70303). The parse data helps reliable coverage analysis [3]. Please advise. Best regards Kirill [1] https://github.com/krlmlr/covr.dummy [2] http://rpubs.com/krlmlr/getParseData [3] https://github.com/jimhester/covr/pull/154 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [R-pkg-devel] Namespace error
It's difficult to tell without seeing the source code, but the NAMESPACE you posted doesn't seem to contain "View". This file usually gets updated when you call devtools::document() or roxygen2::roxygenize(). What happens if you run one of these functions? -Kirill On 16.02.2016 18:14, Glenn Schultz wrote: All I am not sure why I am getting this error and I cannot find anything on the net other than try to restart R. I am using Roxygen2 and it clearly says don't edit by hand at the top of the namespace so I am stuck as what to do or look for. Glenn Error in namespaceExport(ns, exports) : undefined exports: View Error: package or namespace load failed for ‘BondLab’ Execution halted * checking whether the namespace can be loaded with stated dependencies ... WARNING Error in namespaceExport(ns, exports) : undefined exports: View Calls: loadNamespace ... namespaceImportFrom -> asNamespace -> loadNamespace -> namespaceExport Execution halted A namespace must be able to be loaded with just the base namespace loaded: otherwise if the namespace gets loaded by a saved object, the session will be unable to start. Probably some imports need to be declared in the NAMESPACE file. * checking whether the namespace can be unloaded cleanly ... OK * checking dependencies in R code ... NOTE Error: package or namespace load failed for ‘BondLab’ Call sequence: 2: stop(gettextf("package or namespace load failed for %s", sQuote(package)), call. = FALSE, domain = NA) Here is my namespace: # Generated by roxygen2: do not edit by hand export(AtomsData) export(BeginBal) export(Bond) export(BondAnalytics) export(BondBasisConversion) export(BondCashFlows) export(CDR.To.MDR) export(CIRBondPrice) export(CIRSim) export(CPR.To.SMM) export(CalibrateCIR) export(CashFlowTable) export(CollateralGroup) export(CusipRecord) export(DollarRoll) export(DollarRollAnalytics) export(Effective.Convexity) export(Effective.Duration) export(Effective.Measure) export(EndingBal) export(EstimYTM) export(Forward.Rate) export(ForwardPassThrough) export(HPISim) export(Interest) export(MBS) export(MakeBondDetails) export(MakeCollateral) export(MakeMBSDetails) export(MakeModelTune) export(MakeRAID) export(MakeRDME) export(MakeScenario) export(MakeSchedule) export(MakeTranche) export(ModelTune) export(Mortgage.Monthly.Payment) export(Mortgage.OAS) export(MortgageCashFlow) export(MortgageCashFlowArray) export(MortgageCashFlow_Array) export(MortgageRate) export(Mtg.Scenario) export(MtgRate) export(MtgTermStructure) export(PPC.Ramp) export(PassThroughOAS) export(PaymentDate) export(PrepaymentAssumption) export(RDMEData) export(RDMEFactor) export(REMICDeal) export(REMICGroupConn) export(REMICSchedules) export(REMICWaterFall) export(Rates) export(ReadRAID) export(Remain.Balance) export(RemicStructure) export(SMM.To.CPR) export(SMMVector.To.CPR) export(SaveCollGroup) export(SaveMBS) export(SaveModelTune) export(SaveRAID) export(SaveRDME) export(SaveREMIC) export(SaveScenario) export(SaveSchedules) export(SaveTranche) export(SaveTranches) export(ScenarioCall) export(Sched.Prin) export(SwapRateData) export(TermStructure) export(TimeValue) export(Tranches) export(ULTV) export(bondprice) exportClasses(AtomsAnalytics) exportClasses(AtomsData) exportClasses(AtomsScenario) exportClasses(BondCashFlows) exportClasses(BondDetails) exportClasses(BondTermStructure) exportClasses(MBSDetails) exportClasses(TermStructure) import(data.tree) import(methods) import(optimx) importFrom(lubridate,"%m+%") importFrom(lubridate,day) importFrom(lubridate,month) importFrom(lubridate,year) importFrom(lubridate,years) importFrom(termstrc,create_cashflows_matrix) importFrom(termstrc,create_maturities_matrix) importFrom(termstrc,estim_cs) importFrom(termstrc,estim_nss) importFrom(termstrc,forwardrates) importFrom(termstrc,spotrates) __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[Rd] R CMD check --as-cran without qpdf
Today, a package that has an HTML vignette (but no PDF vignette) failed R CMD check --as-cran on a system without qpdf. I think the warning originates here [1], due to a premature check for the existence of qpdf [2]. Setting R_QPDF=true (as in /bin/true) helped, but perhaps it's possible to check qpdf existence only when it matters. I have attached a patch (untested) that could serve as a starting point. The code links correspond to SVN revision 69500. Thanks. Best regards Kirill [1] https://github.com/wch/r-source/blob/f42ee5e7ecf89a245afd6619b46483f1e3594ab7/src/library/tools/R/check.R#L322-L326, [2] https://github.com/wch/r-source/blob/f42ee5e7ecf89a245afd6619b46483f1e3594ab7/src/library/tools/R/check.R#L4426-L4428 diff --git src/library/tools/R/check.R src/library/tools/R/check.R index a508453..e4e5027 100644 --- src/library/tools/R/check.R +++ src/library/tools/R/check.R @@ -319,11 +319,7 @@ setRlibs <- paste(" file", paste(sQuote(miss[f]), collapse = ", "), "will not be installed: please remove it\n")) } -if (dir.exists("inst/doc")) { -if (R_check_doc_sizes) check_doc_size() -else if (as_cran) -warningLog(Log, "'qpdf' is needed for checks on size reduction of PDFs") -} +if (R_check_doc_sizes && dir.exists("inst/doc")) check_doc_size() if (dir.exists("inst/doc") && do_install) check_doc_contents() if (dir.exists("vignettes")) check_vign_contents(ignore_vignettes) if (!ignore_vignettes) { @@ -2129,12 +2125,18 @@ setRlibs <- check_doc_size <- function() { -## Have already checked that inst/doc exists and qpdf can be found +## Have already checked that inst/doc exists pdfs <- dir('inst/doc', pattern="\\.pdf", recursive = TRUE, full.names = TRUE) pdfs <- setdiff(pdfs, "inst/doc/Rplots.pdf") if (length(pdfs)) { checkingLog(Log, "sizes of PDF files under 'inst/doc'") +if (!nzchar(Sys.which(Sys.getenv("R_QPDF", "qpdf" { +if (as_cran) +warningLog(Log, "'qpdf' is needed for checks on size reduction of PDFs") +return() +} + any <- FALSE td <- tempfile('pdf') dir.create(td) @@ -4424,8 +4426,7 @@ setRlibs <- config_val_to_logical(Sys.getenv("_R_CHECK_PKG_SIZES_", "TRUE")) && nzchar(Sys.which("du")) R_check_doc_sizes <- - config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES_", "TRUE")) && -nzchar(Sys.which(Sys.getenv("R_QPDF", "qpdf"))) + config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES_", "TRUE")) R_check_doc_sizes2 <- config_val_to_logical(Sys.getenv("_R_CHECK_DOC_SIZES2_", "FALSE")) R_check_code_assign_to_globalenv <- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] static pdf vignette
Perhaps the R.rsp package by Henrik Bengtsson [1,2] is an option. Cheers Kirill [1] http://cran.r-project.org/web/packages/R.rsp/index.html [2] https://github.com/HenrikBengtsson/R.rsp On 27.02.2015 02:44, Wang, Zhu wrote: Dear all, In my package I have a computational expensive Rnw file which can't pass R CMD check. Therefore I set eval=FALSE in the Rnw file. But I would like to have the pdf vignette generated by the Rnw file with eval=TRUE. It seems to me a static pdf vignette is an option. Any suggestions on this? Thanks, Zhu Wang **Connecticut Children's Confidentiality Notice** This e-mail message, including any attachments, is for...{{dropped:6}} __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] xtabs and NA
On 09.02.2015 16:59, Gabor Grothendieck wrote: On Mon, Feb 9, 2015 at 8:52 AM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch wrote: Passing table the output of model.frame would still allow the use of a formula interface: mf - model.frame( ~ data, na.action = na.pass) do.call(table, c(mf, useNA = ifany)) abc NA 1111 Fair enough, this qualifies as a workaround, and IMO this is how xtabs should handle it internally to allow writing xtabs(~data, na.action = na.pass) -- or at least xtabs(~data, na.action = na.pass, exclude = NULL) if backward compatibility is desired. Would anyone with write access to R's SVN repo care enough about this situation to review a patch? Thanks. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] xtabs and NA
Hi I haven't found a way to produce a tabulation from factor data with NA values using xtabs. Please find a minimal example below, it's also on R-pubs [1]. Tested with R 3.1.2 and R-devel r67720. It doesn't seem to be documented explicitly that it's not supported. From reading the code [2] it looks like the relevant call to table() doesn't set the useNA parameter, which I think is necessary to make NAs show up in the result. Am I missing anything? If this a bug -- would a patch be welcome? Do we need compatibility with the current behavior? I'm aware of workarounds, I just prefer xtabs() over table() for its interface. Thanks. Best regards Kirill [1] http://rpubs.com/krlmlr/xtabs-NA [2] https://github.com/wch/r-source/blob/780021752eb83a71e2198019acf069ba8741103b/src/library/stats/R/xtabs.R#L60 data - factor(letters[1:4], levels = letters[1:3]) data ## [1] abcNA ## Levels: a b c xtabs(~data) ## data ## a b c ## 1 1 1 xtabs(~data, na.action = na.pass) ## data ## a b c ## 1 1 1 xtabs(~data, na.action = na.pass, exclude = numeric()) ## data ## a b c ## 1 1 1 xtabs(~data, na.action = na.pass, exclude = NULL) ## data ## a b c ## 1 1 1 sessionInfo() ## R version 3.1.2 (2014-10-31) ## Platform: x86_64-pc-linux-gnu (64-bit) ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=de_CH.UTF-8LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=de_CH.UTF-8LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=de_CH.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=de_CH.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] magrittr_1.5ProjectTemplate_0.6-1.0 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.8evaluate_0.5.7 formatR_1.0.3 htmltools_0.2.6 ## [5] knitr_1.9.2 rmarkdown_0.5.1 stringr_0.6.2 tools_3.1.2 ## [9] ulimit_0.0-2 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to test impact of candidate changes to package?
If you don't intend to keep the old business logic in the long run, perhaps a version control system such as Git can help you. If you use it in single-user mode, you can think of it as a backup system where you manually create each snapshot and give it a name, but it actually can do much more. For your use case, you can open a new *branch* where you implement your changes, and implement your testing logic simultaneously in both branches (using *merge* operations). The system handles switching between branches, so you can really perform invasive changes, and revert if you find that a particular change breaks something. RStudio has Git support, but you probably need to use the shell to create a branch. On Windows or OS X the GitHub client helps you to get started. Cheers Kirill On 09/10/2014 11:14 AM, Stephanie Locke wrote: I have unit tests using testthat but these are typically of these types: 1) Check for correct calculation for a single set of valid inputs 2) Check for correct calculation for a larger set of valid inputs 3) Check for errors when providing incorrect inputs 4) Check for known frailties / past issues This is more for where changes are needed to functions that apply various bits of business logic that can change over time, so there is no one answer. A unit test (at least as I understand it) can be worked through to make sure that given inputs, the output is computationally correct. What I'd like to do is overall the impact of a potential change by testing version 1 of a function in a package for a sample, then test version 2 of a function in a package for a sample and compare the results. My difficulties encountered so far is I'm reluctantly to manually do this change invasively by overwriting the relevant files in the R directory, and then say using devtools to load it and test it with testthat as I risk producing incorrect states of my package and potentially releasing the wrong thing. My preference would be a non-invasive method. Currently, where I'm trying to do this non-invasively I source a new version of the function stored in a separate directory, but some of the functions dependent on it continue to reference to the package version of the functions, this means that when I'm doing test #2 I have to load lots more functions and hope I've caught them all (or do some sort of dependency hunting programmatically). I may be missing something about testthat, but what I'm doing now seems to be nowhere near optimal and I'd love to have a better solution. Cheers Stephanie Locke BI Credit Risk Analyst -Original Message- From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be] Sent: 10 September 2014 09:30 To: Stephanie Locke; r-devel@r-project.org Subject: RE: How to test impact of candidate changes to package? Dear Stephanie, Have a look at the testthat package and the related article in the R Journal. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] Namens Stephanie Locke Verzonden: woensdag 10 september 2014 9:55 Aan: r-devel@r-project.org Onderwerp: [Rd] How to test impact of candidate changes to package? I use a package to contain simple functions that can be handled by unit tests for correctness and more complex functions that combine the simple functions with business logic. Where there are proposals to change either the simple functions or the business logic, a sample needs to be run before the change and then after it to understand the impact of the change. I do this currently by 1. Using Rmarkdown documents 2. Loading the package as-is 3. Getting my sample 4. Running my sample through the package as-is and outputting table of results 5. sourceing new copies of functions 6. Running my sample again and outputting table of results 7. Reloading package and sourceing different copies of functions as required I really don't think this is a good way to do this as it risks missing downstream dependencies of the functions I'm trying to load into the global namespace to test. Has anyone else had to do this sort of testing before on their packages? How did you do it? Am I missing an obvious package / framework that can do this? Cheers, Steph -- Stephanie
Re: [Rd] Request to review a patch for rpart
Gabriel Thanks for your feedback. Indeed, I was not particularly clear here. The empty model is just a very special case in a more general setting. I'd have to work around this deficiency in my code -- sure I can do that, but I thought a generic solution should be possible. In particular, I'm using predict.rpart(..., type = prob) -- this just reflects the observed relative frequencies. Cheers Kirill On 08/15/2014 06:44 PM, Gabriel Becker wrote: Kirill, Perhaps I'm just being obtuse, but what are you proposing rpart do in the case of an empty model? Return a tree that always guesses the most common label, or doesn't guess at all (NA)? It doesn't seem like you'd need rpart for either of those. ~G On Wed, Aug 13, 2014 at 3:51 AM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch mailto:kirill.muel...@ivt.baug.ethz.ch wrote: Dear list For my work, it would be helpful if rpart worked seamlessly with an empty model: library(rpart); rpart(formula=y~0, data=data.frame(y=factor(1:10))) Currently, an unrelated error (originating from na.rpart) is thrown. At some point in the near future, I'd like to release a package to CRAN which uses rpart and relies on that functionality. I have prepared a patch (minor modifications at three places, and a test) which I'd like to propose for inclusion in the next CRAN release of rpart. The patch can be reviewed at https://github.com/krlmlr/rpart/tree/empty-model, the files (based on the current CRAN release 4.1-8) can be downloaded from https://github.com/krlmlr/rpart/archive/empty-model.zip. Thanks for your attention. With kindest regards Kirill Müller __ R-devel@r-project.org mailto:R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Gabriel Becker Graduate Student Statistics Department University of California, Davis __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Request to review a patch for rpart
Dear list For my work, it would be helpful if rpart worked seamlessly with an empty model: library(rpart); rpart(formula=y~0, data=data.frame(y=factor(1:10))) Currently, an unrelated error (originating from na.rpart) is thrown. At some point in the near future, I'd like to release a package to CRAN which uses rpart and relies on that functionality. I have prepared a patch (minor modifications at three places, and a test) which I'd like to propose for inclusion in the next CRAN release of rpart. The patch can be reviewed at https://github.com/krlmlr/rpart/tree/empty-model, the files (based on the current CRAN release 4.1-8) can be downloaded from https://github.com/krlmlr/rpart/archive/empty-model.zip. Thanks for your attention. With kindest regards Kirill Müller __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] UTC time zone on Windows
Hi I'm having trouble running R CMD build and check with UTC time zone setting in Windows Server 2012. I can't seem to get rid of the following warning: unable to identify current timezone 'C': please set environment variable 'TZ' However, setting TZ to either Europe/London or GMT Standard Time didn't help. It seems to me that the warning originates in registryTZ.c (https://github.com/wch/r-source/blob/776708efe6003e36f02587ad47b2e19e2f69/src/extra/tzone/registryTZ.c#L363). I have therefore looked at HKLM\SYSTEM\CurrentControlSet\Control\TimeZoneInformation, to learn that TimeZoneKeyName is set to UTC. This time zone is not defined in TZtable, but is present in this machine's HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Time Zones. (Also, the text of the warning permits the possibility that only the first character of the time zone is used for the warning message -- in the code, a const wchar_t* is used for a %s placeholder.) Below is a link to the log of such a failing run. The first 124 lines are registry dumps, output of R CMD * is near the end of the log at lines 212 and 224. https://ci.appveyor.com/project/krlmlr/r-appveyor/build/1.0.36 This happens with R 3.1.1 and R-devel r66309. Is there a workaround I have missed, short of updating TZtable? How can I help updating TZtable? Thanks! Cheers Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs
On 03/26/2014 06:46 PM, Paul Gilbert wrote: On 03/26/2014 04:58 AM, Kirill Müller wrote: Dear list It is possible to store expected output for tests and examples. From the manual: If tests has a subdirectory Examples containing a file pkg-Ex.Rout.save, this is compared to the output file for running the examples when the latter are checked. And, earlier (written in the context of test output, but apparently applies here as well): ..., these two are compared, with differences being reported but not causing an error. I think a NOTE would be appropriate here, in order to be able to detect this by only looking at the summary. Is there a reason for not flagging differences here? The problem is that differences occur too often because this is a comparison of characters in the output files (a diff). Any output that is affected by locale, node name or Internet downloads, time, host, or OS, is likely to cause a difference. Also, if you print results to a high precision you will get differences on different systems, depending on OS, 32 vs 64 bit, numerical libraries, etc. A better test strategy when it is numerical results that you want to compare is to do a numerical comparison and throw an error if the result is not good, something like r - result from your function rGood - known good value fuzz - 1e-12 #tolerance if (fuzz max(abs(r - rGood))) stop('Test xxx failed.') It is more work to set up, but the maintenance will be less, especially when you consider that your tests need to run on different OSes on CRAN. You can also use try() and catch error codes if you want to check those. Thanks for your input. To me, this is a different kind of test, for which I'd rather use the facilities provided by the testthat package. Imagine a function that operates on, say, strings, vectors, or data frames, and that is expected to produce completely identical results on all platforms -- here, a character-by-character comparison of the output is appropriate, and I'd rather see a WARNING or ERROR if something fails. Perhaps this functionality can be provided by external packages like roxygen and testthat: roxygen could create the good output (if asked for) and set up a testthat test that compares the example run with the good output. This would duplicate part of the work already done by base R; the duplication could be avoided if there was a way to specify the severity of a character-level difference between output and expected output, perhaps by means of an .Rout.cfg file in DCF format: OnDifference: mute|note|warning|error Normalize: [R expression] Fuzziness: [number of different lines that are tolerated] On that note: Is there a convenient way to create the .Rout.save files in base R? By convenient I mean a single function call, not checking and manually copying as suggested here: https://stat.ethz.ch/pipermail/r-help/2004-November/060310.html . Cheers Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] NOTE when detecting mismatch in output, and codes for NOTEs, WARNINGs and ERRORs
Dear list It is possible to store expected output for tests and examples. From the manual: If tests has a subdirectory Examples containing a file pkg-Ex.Rout.save, this is compared to the output file for running the examples when the latter are checked. And, earlier (written in the context of test output, but apparently applies here as well): ..., these two are compared, with differences being reported but not causing an error. I think a NOTE would be appropriate here, in order to be able to detect this by only looking at the summary. Is there a reason for not flagging differences here? The following is slightly related: Some compilers and static code analysis tools assign a numeric code to each type of error or warning they check for, and print it. Would that be possible to do for the anomalies detected by R CMD check? The most significant digit could denote the severity of the NOTE, WARNING or ERROR. This would further simplify (semi-)automated analysis of the output of R CMD check, e.g. in the context of automated tests. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Docker versus Vagrant for reproducability - was: The case for freezing CRAN
On 03/22/2014 02:10 PM, Nathaniel Smith wrote: On 22 Mar 2014 12:38, Philippe GROSJEAN philippe.grosj...@umons.ac.be wrote: On 21 Mar 2014, at 20:21, Gábor Csárdi csardi.ga...@gmail.com wrote: In my opinion it is somewhat cumbersome to use this for everyday work, although good virtualization software definitely helps. Gabor Additional info: you access R into the VM from within the host by ssh. You can enable x11 forwarding there and you also got GUI stuff. It works like a charm, but there are still some problems on my side when I try to disconnect and reconnect to the same R process. I can solve this with, say, screen. However, if any X11 window is displayed while I disconnect, R crashes immediately on reconnection. You might find the program 'xpra' useful. It's like screen, but for x11 programs. -n I second that. However, by default, xpra and GNU Screen are not aware of each other. To connect to xpra from within GNU Screen, you usually need to set the DISPLAY environment variable manually. I have described a solution that automates this, so that GUI applications just work from within GNU Screen and also survive a disconnect: http://krlmlr.github.io/integrating-xpra-with-screen/ . -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Deep copy of factor levels?
Hi It seems that selecting an element of a factor will copy its levels (Ubuntu 13.04, R 3.0.2). Below is the output of a script that creates a factor with 1 elements and then calls as.list() on it. The new object seems to use more than 700 MB, and inspection of the levels of the individual elements of the list suggest that they are distinct objects. Perhaps some performance gain could be achieved by copying the levels by reference, but I don't know R internals well enough to see if it's possible. Is there a particular reason for creating a full copy of the factor levels? This has come up when looking at the performance of rbind.fill (in the plyr package) with factors: https://github.com/hadley/plyr/issues/206 . Best regards Kirill gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 325977 17.51074393 57.4 10049951 536.8 Vcells 4617168 35.3 87439742 667.2 204862160 1563.0 system.time(x - factor(seq_len(1e4))) user system elapsed 0.008 0.000 0.007 system.time(xx - as.list(x)) user system elapsed 4.263 0.000 4.322 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells385991 20.71074393 57.4 10049951 536.8 Vcells 104672187 798.6 112367694 857.3 204862160 1563.0 .Internal(inspect(levels(xx[[1]]))) @387f620 16 STRSXP g1c7 [MARK,NAM(2)] (len=1, tl=0) @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 1 @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 2 @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 3 @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 4 @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 5 ... .Internal(inspect(levels(xx[[2]]))) @1b38cb90 16 STRSXP g1c7 [MARK,NAM(2)] (len=1, tl=0) @144da4e8 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 1 @144da518 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 2 @27d1298 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 3 @144da548 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 4 @144da578 09 CHARSXP g1c1 [MARK,gp=0x60] [ASCII] [cached] 5 ... __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Detect a terminated pipe
Hi Is there a way to detect that the process that corresponds to a pipe has ended? On my system (Ubuntu 13.04), I see p - pipe(true, w); Sys.sleep(1); system(ps -elf | grep true | grep -v grep); isOpen(p) [1] TRUE The true process has long ended (as the filtered ps system call emits no output), still R believes that the pipe p is open. Thanks for your input. Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Detect a terminated pipe
On 03/14/2014 03:54 PM, Simon Urbanek wrote: As far as R is concerned, the connection is open. In addition, pipes exist even without the process - you can close one end of a pipe and it will still exist (that’s what makes pipes useful, actually, because you can choose to close arbitrary combination of the R/W ends). Detecting that the other end of the pipe has closed is generally done by sending/receiving data to/from the end of interest - i.e. reading from a pipe that has closed the write end on the other side will yield 0 bytes read. Writing to a pipe that has closed the read end on the other side will yield SIGPIPE error (note that for text connections you have to call flush() to send the buffer): p=pipe(true,r) readLines(p) character(0) close(p) p=pipe(true,w) writeLines(, p) flush(p) Error in flush.connection(p) : ignoring SIGPIPE signal close(p) Thanks for your reply. I tried this in an R console and received the error, just like you described. Unfortunately, the error is not thrown when trying the same in RStudio. Any ideas? Cheers Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] $new cannot be accessed when running from Rscript and methods package is not loaded
Hi Accesses the $new method for a class defined in a package fails if the methods package is not loaded. I have created a test package with the following single code file: newTest - function() { cl - get(someClass) cl$new } someClass - setRefClass(someClass) (This is similar to code actually used in the testthat package.) If methods is not loaded, executing the newTest function fails in the following scenarios: - Package depends on methods (scenario depend) - Package imports methods and imports either the setRefClass function (scenario import-setRefClass) or the whole package (scenario import-methods) It succeeds if the newTest function calls require(methods) (scenario require). The script at https://raw2.github.com/krlmlr/methodsTest/master/test-all.sh creates an empty user library in subdirectory r-lib of the current directory, installs devtools, and tests the four scenarios by repeatedly installing the corresponding version of the package and trying to execute newTest() from Rscript. I have attached the output. The package itself is on GitHub: https://github.com/krlmlr/methodsTest , there is a branch for each scenario. Why does it seem to be necessary to load the methods package here? Best regards Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] $new cannot be accessed when running from Rscript and methods package is not loaded
On 02/11/2014 03:22 AM, Peter Meilstrup wrote: Because depends is treated incorrectly (if I may place a value judgement on it). I had an earlier thread on this, not sure if any changes have taken place since then: http://r.789695.n4.nabble.com/Dependencies-of-Imports-not-attached-td4666529.html Peter Thanks. Could you please clarify: The thread you mention refers to a scenario where a package uses another package that depends on methods. The issue I'm describing doesn't have this, there is only a single package that tries to use $new and fails. ? On that note: A related discussion on R-devel advises depending on methods, but this doesn't seem to be enough in this case: http://r.789695.n4.nabble.com/advise-on-Depends-td4678930.html -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] file.exists does not like path names ending in /
On 01/17/2014 02:56 PM, Gabor Grothendieck wrote: At the moment I am using this to avoid the problem: File.exists - function(x) { if (.Platform$OS == windows grepl([/\\]$, x)) { file.exists(dirname(x)) } else file.exists(x) } but it would be nice if that could be done by file.exists itself. I think that ignoring a terminal slash/backslash on Windows would do no harm: It would improve consistency between platforms, and perhaps nobody really relies on the current behavior. Would shorten the documentation, too. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] file.exists does not like path names ending in /
On 01/17/2014 07:35 PM, William Dunlap wrote: I think that ignoring a terminal slash/backslash on Windows would do no harm: Windows makes a distinction between C: and C:/: the former is not a file (or directory) and the latter is. But, according to the documentation, neither would be currently detected by file.exists, while the latter is a directory, as you said, and should be detected as such. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sweave trims console output in tex mode
On 01/03/2014 02:34 AM, Duncan Murdoch wrote: Carriage returns usually don't matter in LaTeX I'd rather say they do. One is like a space, two or more end a paragraph and start a new one. If newlines are stripped away, the meaning of the TeX code can change, in some cases dramatically (e.g. if comments are written to the TeX code). Also, I don't understand why the option is called strip.white, at least for results=tex. The docs say that blank lines at the beginning and end of output are removed, but the observed behavior is to remove the terminating carriage return of the output. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sweave trims console output in tex mode
I'm sorry, I didn't mean to be rude. Do you prefer including the entire original message when replying? Or perhaps I misunderstood you when you wrote: Carriage returns usually don't matter in LaTeX, so I didn't even know about this option, though I use results=tex quite often. I had to look at the source to see where the newlines were going, and saw it there. Could you please clarify? Thanks. -Kirill On 01/03/2014 11:39 AM, Duncan Murdoch wrote: It's dishonest to quote me out of context. Duncan Murdoch On 14-01-03 3:40 AM, Kirill Müller wrote: On 01/03/2014 02:34 AM, Duncan Murdoch wrote: Carriage returns usually don't matter in LaTeX I'd rather say they do. One is like a space, two or more end a paragraph and start a new one. If newlines are stripped away, the meaning of the TeX code can change, in some cases dramatically (e.g. if comments are written to the TeX code). Also, I don't understand why the option is called strip.white, at least for results=tex. The docs say that blank lines at the beginning and end of output are removed, but the observed behavior is to remove the terminating carriage return of the output. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sweave trims console output in tex mode
On 01/03/2014 01:06 PM, Duncan Murdoch wrote: On 14-01-03 5:47 AM, Kirill Müller wrote: I'm sorry, I didn't mean to be rude. Do you prefer including the entire original message when replying? Or perhaps I misunderstood you when you wrote: You don't need to include irrelevant material in your reply, but you should include explanatory material when you are arguing about a particular claim. If you aren't sure whether it is relevant or not, then you should probably ask for clarification rather than arguing with the claim. Thanks. In the future, I'll quote at least full sentences and everything they refer to, to avoid confusion and make sure that context is maintained. Carriage returns usually don't matter in LaTeX, so I didn't even know about this option, though I use results=tex quite often. I had to look at the source to see where the newlines were going, and saw it there. Could you please clarify? Thanks. Single carriage returns are usually equivalent to spaces. Multiple carriage returns separate paragraphs, but they are rare in code chunk output in my Sweave usage. I normally put plain text in the LaTeX part of the Sweave document. Indeed, it only makes a difference for code that generates large portions of LaTeX (such as tikzDevice). I have checked my own .Rnw files, and I have used results=tex about 600 times, but never used strip.white. I've also looked at the .Rnw files in CRAN packages, and strip.white=true and strip.white=all are used there about 140 times, but strip.white=false is only used 10 times. I think only one package (SweaveListingUtils) uses strip.white=false in combination with results=tex. So while I agree Martin's adaptive option would have been a better default than true, I think it would be more likely to cause trouble than to solve it. I agree, given this data and considering that trimming the terminal newline can be considered a feature. Perhaps comments are the only use case where the newline is really important. But then I don't see how to reliably detect comments, as the catcode for % can be changed, e.g., in a verbatim environment. I'll consider printing a \relax after the comment in tikzDevice, this should be robust and sufficient. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Sweave trims console output in tex mode
Hi In the example .Rnw file below, only the newline between c and d is visible in the resulting .tex file after running R CMD Sweave. What is the reason for this behavior? Newlines are important in LaTeX and should be preserved. In particular, this behavior leads to incorrect LaTeX code generated when using tikz(console=TRUE) inside a Sweave chunk, as shown in the tikzDevice vignette. A similar question has been left unanswered before: https://stat.ethz.ch/pipermail/r-help/2010-June/242019.html . I am well aware of knitr, I'm looking for a solution for Sweave. Cheers Kirill \documentclass{article} \begin{document} inline,echo=FALSE,results=tex= cat(a\n) cat(b\n \n) cat(c\nd) @ \end{document} __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sweave trims console output in tex mode
On 01/03/2014 01:45 AM, Duncan Murdoch wrote: You are running with the strip.white option set to TRUE. That strips blank lines at then beginning and end of each output piece. Just set strip.white=FALSE. Thanks, the code below works perfectly. I have also found the documentation in ?RweaveLatex . I'm not sure if the default setting is sensible for results=tex, though. Has this changed in the recent past? -Kirill \documentclass{article} \begin{document} inline,echo=FALSE,results=tex,strip.white=FALSE= cat(a\n) cat(b\n \n) cat(c\nd) @ \end{document} __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Sweave trims console output in tex mode
On 01/03/2014 01:59 AM, Duncan Murdoch wrote: But results=tex is not the default. Having defaults for one option depend on the setting for another is confusing, so I think the current setting is appropriate. True. On the other hand, I cannot imagine that results=tex is useful at all without strip.white=FALSE. If the strip.white option would auto-adjust, things would just work. Anyway, I'm not a very active user of Sweave. -Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree
Gabor I agree with you. There's Travis CI, and r-travis -- an attempt to integrate R package testing with Travis. Pushing back to GitHub is possible, but the setup is somewhat difficult. Also, this can be subject to race conditions because each push triggers a test run and they can happen in parallel even for the same repository. How do you handle branches? It would be really great to be able to execute custom R code before building. Perhaps in a PreBuild: section in DESCRIPTION? Cheers Kirill On 12/12/2013 02:21 AM, Gábor Csárdi wrote: Hi, this is maybe mostly a personal preference, but I prefer not to put generated files in the vc repository. Changes in the generated files, especially if there is many of them, pollute the diffs and make them less useful. If you really want to be able to install the package directly from github, one solution is to 1. create another repository, that contains the complete generated package, so that install_github() can install it. 2. set up a CI service, that can download the package from github, build the package or the generated files (check the package, while it is at it), and then push the build stuff back to github. 3. set up a hook on github, that invokes the CI after each commit. I have used this setup in various projects with jenkins-ci and it works well. Diffs are clean, the package is checked and built frequently, and people can download it without having to install the tools that generate the generated files. The only downside is that you need to install a CI, so you need a server for that. Maybe you can do this with travis-ci, maybe not, I am not familiar with it that much. Best, Gabor On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch wrote: Hi Quite a few R packages are now available on GitHub long before they appear on CRAN, installation is simple thanks to devtools::install_github(). However, it seems to be common practice to keep the .Rd files (and NAMESPACE and the Collate section in the DESCRIPTION) in the Git tree, and to manually update it, even if they are autogenerated from the R code by roxygen2. This requires extra work for each update of the documentation and also binds package development to a specific version of roxygen2 (because otherwise lots of bogus changes can be added by roxygenizing with a different version). What options are there to generate the .Rd files during build/install? In https://github.com/hadley/devtools/issues/43 the issue has been discussed, perhaps it can be summarized as follows: - The devtools package is not the right place to implement roxygenize-before-build - A continuous integration service would be better for that, but currently there's nothing that would be easy to use - Roxygenizing via src/Makefile could work but requires further investigation and an installation of Rtools/xcode on Windows/OS X Especially the last point looks interesting to me, but since this is not widely used there must be pitfalls I'm not aware of. The general idea would be: - Place code that builds/updates the .Rd and NAMESPACE files into src/Makefile - Users installing the package from source will require infrastructure (Rtools/make) - For binary packages, the .Rd files are already generated and added to the .tar.gz during R CMD build before they are submitted to CRAN/WinBuilder, and they are also generated (in theory) by R CMD build --binary I'd like to hear your opinion on that. I have also found a thread on package development workflow (https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but there was nothing on un-versioning .Rd files. Cheers Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- _ ETH Zürich Institute for Transport Planning and Systems HIL F 32.2 Wolfgang-Pauli-Str. 15 8093 Zürich Phone: +41 44 633 33 17 Fax: +41 44 633 10 57 Secretariat: +41 44 633 31 05 E-Mail: kirill.muel...@ivt.baug.ethz.ch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree
On 12/13/2013 12:50 PM, Romain Francois wrote: Pushing back to github is not so difficult. See e.g http://blog.r-enthusiasts.com/2013/12/04/automated-blogging.html Thanks for the writeup, I'll try this. Perhaps it's better to push the results of `R CMD build`, though. You can manage branches easily in travis. You could for example decide to do something different if you are on the master branch ... That's right. But then no .Rd files are built when I'm on a branch, so I can't easily preview the result. The ideal situation would be: 1. I manage only R source files on GitHub, not Rd files, NAMESPACE nor the Collate section of DESCRIPTION. Machine-readable instructions on how to build those are provided with the package. 2. Anyone can install from GitHub using devtools::install_github(). This also should work for branches, forks and pull requests. 3. I can build the package so that the result can be accepted by CRAN. The crucial point on that list is point 2, the others I can easily solve myself. The way I see it, point 2 can be tackled by extending devtools or extending the ways packages are built. Extending devtools seems to be the inferior approach, although, to be honest, I'd be fine with that as well. -Kirill Romain Le 13 déc. 2013 à 12:03, Kirill Müller kirill.muel...@ivt.baug.ethz.ch mailto:kirill.muel...@ivt.baug.ethz.ch a écrit : Gabor I agree with you. There's Travis CI, and r-travis -- an attempt to integrate R package testing with Travis. Pushing back to GitHub is possible, but the setup is somewhat difficult. Also, this can be subject to race conditions because each push triggers a test run and they can happen in parallel even for the same repository. How do you handle branches? It would be really great to be able to execute custom R code before building. Perhaps in a PreBuild: section in DESCRIPTION? Cheers Kirill On 12/12/2013 02:21 AM, Gábor Csárdi wrote: Hi, this is maybe mostly a personal preference, but I prefer not to put generated files in the vc repository. Changes in the generated files, especially if there is many of them, pollute the diffs and make them less useful. If you really want to be able to install the package directly from github, one solution is to 1. create another repository, that contains the complete generated package, so that install_github() can install it. 2. set up a CI service, that can download the package from github, build the package or the generated files (check the package, while it is at it), and then push the build stuff back to github. 3. set up a hook on github, that invokes the CI after each commit. I have used this setup in various projects with jenkins-ci and it works well. Diffs are clean, the package is checked and built frequently, and people can download it without having to install the tools that generate the generated files. The only downside is that you need to install a CI, so you need a server for that. Maybe you can do this with travis-ci, maybe not, I am not familiar with it that much. Best, Gabor On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch mailto:kirill.muel...@ivt.baug.ethz.ch wrote: Hi Quite a few R packages are now available on GitHub long before they appear on CRAN, installation is simple thanks to devtools::install_github(). However, it seems to be common practice to keep the .Rd files (and NAMESPACE and the Collate section in the DESCRIPTION) in the Git tree, and to manually update it, even if they are autogenerated from the R code by roxygen2. This requires extra work for each update of the documentation and also binds package development to a specific version of roxygen2 (because otherwise lots of bogus changes can be added by roxygenizing with a different version). What options are there to generate the .Rd files during build/install? In https://github.com/hadley/devtools/issues/43 the issue has been discussed, perhaps it can be summarized as follows: - The devtools package is not the right place to implement roxygenize-before-build - A continuous integration service would be better for that, but currently there's nothing that would be easy to use - Roxygenizing via src/Makefile could work but requires further investigation and an installation of Rtools/xcode on Windows/OS X Especially the last point looks interesting to me, but since this is not widely used there must be pitfalls I'm not aware of. The general idea would be: - Place code that builds/updates the .Rd and NAMESPACE files into src/Makefile - Users installing the package from source will require infrastructure (Rtools/make) - For binary packages, the .Rd files are already generated and added to the .tar.gz during R CMD build before they are submitted to CRAN/WinBuilder, and they are also generated (in theory) by R CMD build --binary I'd like to hear your opinion on that. I have also
Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree
Thanks a lot. This would indeed solve the problem. I'll try mkdist today ;-) Is the NEWS file parsed before of after mkdist has been executed? Would you be willing to share the code for the infrastructure, perhaps on GitHub? -Kirill On 12/13/2013 09:14 PM, Simon Urbanek wrote: FWIW this is essentially what RForge.net provides. Each GitHub commit triggers a build (branches are supported as the branch info is passed in the WebHook) which can be either classic R CMD build or a custom shell script (hence you can do anything you want). The result is a tar ball (which includes the generated files) and that tar ball gets published in the R package repository. R CMD check is run as well on the tar ball and the results are published. This way you don't need devtools, users can simply use install.packages() without requiring any additional tools. There are some talks about providing the above as a cloud service, so that anyone can run and/or use it. Cheers, Simon On Dec 13, 2013, at 8:51 AM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch wrote: On 12/13/2013 12:50 PM, Romain Francois wrote: Pushing back to github is not so difficult. See e.g http://blog.r-enthusiasts.com/2013/12/04/automated-blogging.html Thanks for the writeup, I'll try this. Perhaps it's better to push the results of `R CMD build`, though. You can manage branches easily in travis. You could for example decide to do something different if you are on the master branch ... That's right. But then no .Rd files are built when I'm on a branch, so I can't easily preview the result. The ideal situation would be: 1. I manage only R source files on GitHub, not Rd files, NAMESPACE nor the Collate section of DESCRIPTION. Machine-readable instructions on how to build those are provided with the package. 2. Anyone can install from GitHub using devtools::install_github(). This also should work for branches, forks and pull requests. 3. I can build the package so that the result can be accepted by CRAN. The crucial point on that list is point 2, the others I can easily solve myself. The way I see it, point 2 can be tackled by extending devtools or extending the ways packages are built. Extending devtools seems to be the inferior approach, although, to be honest, I'd be fine with that as well. -Kirill Romain Le 13 déc. 2013 à 12:03, Kirill Müller kirill.muel...@ivt.baug.ethz.ch mailto:kirill.muel...@ivt.baug.ethz.ch a écrit : Gabor I agree with you. There's Travis CI, and r-travis -- an attempt to integrate R package testing with Travis. Pushing back to GitHub is possible, but the setup is somewhat difficult. Also, this can be subject to race conditions because each push triggers a test run and they can happen in parallel even for the same repository. How do you handle branches? It would be really great to be able to execute custom R code before building. Perhaps in a PreBuild: section in DESCRIPTION? Cheers Kirill On 12/12/2013 02:21 AM, Gábor Csárdi wrote: Hi, this is maybe mostly a personal preference, but I prefer not to put generated files in the vc repository. Changes in the generated files, especially if there is many of them, pollute the diffs and make them less useful. If you really want to be able to install the package directly from github, one solution is to 1. create another repository, that contains the complete generated package, so that install_github() can install it. 2. set up a CI service, that can download the package from github, build the package or the generated files (check the package, while it is at it), and then push the build stuff back to github. 3. set up a hook on github, that invokes the CI after each commit. I have used this setup in various projects with jenkins-ci and it works well. Diffs are clean, the package is checked and built frequently, and people can download it without having to install the tools that generate the generated files. The only downside is that you need to install a CI, so you need a server for that. Maybe you can do this with travis-ci, maybe not, I am not familiar with it that much. Best, Gabor On Wed, Dec 11, 2013 at 7:39 PM, Kirill Müller kirill.muel...@ivt.baug.ethz.ch mailto:kirill.muel...@ivt.baug.ethz.ch wrote: Hi Quite a few R packages are now available on GitHub long before they appear on CRAN, installation is simple thanks to devtools::install_github(). However, it seems to be common practice to keep the .Rd files (and NAMESPACE and the Collate section in the DESCRIPTION) in the Git tree, and to manually update it, even if they are autogenerated from the R code by roxygen2. This requires extra work for each update of the documentation and also binds package development to a specific version of roxygen2 (because otherwise lots of bogus changes can be added by roxygenizing with a different version). What options are there to generate the .Rd files during build/install? In https://github.com/hadley/devtools/issues/43 the issue has been discussed
Re: [Rd] Strategies for keeping autogenerated .Rd files out of a Git tree
On 12/13/2013 06:09 PM, Brian Diggs wrote: One downside I can see with this third approach is that by making the package documentation generation part of the build process, you must then make the package depend/require roxygen (or whatever tools you are using to generate documentation). This dependence, though, is just to build the package, not to actually use the package. And by pushing this dependency onto the end users of the package, you have transferred the problem you mentioned (... and also binds package development to a specific version of roxygen2 ...) to the many end users rather than the few developers. That's right. As outlined in another message, roxygen2 would be required for building from the raw source (hosted on GitHub) but not for installing from a source tarball (which would contain the .Rd files). Not sure if that's possible, though. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Strategies for keeping autogenerated .Rd files out of a Git tree
Hi Quite a few R packages are now available on GitHub long before they appear on CRAN, installation is simple thanks to devtools::install_github(). However, it seems to be common practice to keep the .Rd files (and NAMESPACE and the Collate section in the DESCRIPTION) in the Git tree, and to manually update it, even if they are autogenerated from the R code by roxygen2. This requires extra work for each update of the documentation and also binds package development to a specific version of roxygen2 (because otherwise lots of bogus changes can be added by roxygenizing with a different version). What options are there to generate the .Rd files during build/install? In https://github.com/hadley/devtools/issues/43 the issue has been discussed, perhaps it can be summarized as follows: - The devtools package is not the right place to implement roxygenize-before-build - A continuous integration service would be better for that, but currently there's nothing that would be easy to use - Roxygenizing via src/Makefile could work but requires further investigation and an installation of Rtools/xcode on Windows/OS X Especially the last point looks interesting to me, but since this is not widely used there must be pitfalls I'm not aware of. The general idea would be: - Place code that builds/updates the .Rd and NAMESPACE files into src/Makefile - Users installing the package from source will require infrastructure (Rtools/make) - For binary packages, the .Rd files are already generated and added to the .tar.gz during R CMD build before they are submitted to CRAN/WinBuilder, and they are also generated (in theory) by R CMD build --binary I'd like to hear your opinion on that. I have also found a thread on package development workflow (https://stat.ethz.ch/pipermail/r-devel/2011-September/061955.html) but there was nothing on un-versioning .Rd files. Cheers Kirill __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Trouble running Rtools31 on Wine
Hi An attempt to use R and Rtools in Wine fails, see the bug report to Wine: http://bugs.winehq.org/show_bug.cgi?id=34865 The people there say that Rtools uses an outdated Cygwin DLL with a custom patch. Is there any chance we can upgrade our Cygwin DLL to a supported upstream version? Thanks. Cheers Kirill [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel