[R-pkg-devel] Run garbage collector when too many open files
In my package I open handles to temporary files from c++, handles to them are returned to R through vptr objects. The files are deleted then the corresponding R-object is deleted and the garbage collector runs: a <- lvec(10, "integer") rm(a) Then when the garbage collector runs the file is deleted. However, on some platforms (probably with lower limits on the maximum number of file handles a process can have open), I run into the problem that the garbage collector doesn't run often enough. In this case that means that another package of mine using this package generates an error when its tests are run. The simplest solution is to add some calls to gc() in my tests. But a more general/automatic solution would be nice. I thought about something in the lines of robust_lvec <- function(...) { tryCatch({ lvec(...) }, error = function(e) { gc() lvec(...) # duplicated code }) } e.g. try to open a file, when that fails call the garbage collector and try again. However, this introduces duplicated code (in this case only one line, but that can be more), and doesn't help if it is another function that tries to open a file. Is there a better solution? Thanks! Jan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] Tests run without --as-cran and not with
I am trying solve an issue where my tests run fine when run with R CMD check, but not with R CMD check --as-cran. In the tests pandoc is called using system; pandoc then calls R again with a script which is part of the package. The last part seems to fail, see error message below: Running ‘test_mdweave.R’ ERROR Running the tests in ‘tests/test_mdweave.R’ failed. Last 13 lines of output: > message("Weave file") Weave file > fn <- system.file(file.path("examples", md), package = "tinymarkdown") > mdweave(fn) Error running filter /home/eoos/git/tinymarkdown/work/tinymarkdown.Rcheck/tinymarkdown/scripts/filter.R: Filter returned error status 1 > As I mentioned, this only happend when testing with --as-cran. Without --as-cran the output is: * checking tests ... Running ‘test_file_subs_ext.R’ Running ‘test_mdtangle.R’ Running ‘test_mdweave.R’ OK * checking for unstated dependencies in vignettes ... OK Note that I already set the environment variable R_TEST to "" (https://github.com/r-lib/testthat/issues/144; I am not using testthat). This was also needed to get the check without --as-cran running. One thing that I notice it that R_LIBS and R_LIBS_SITE between the two ways of running R CMD check. However, I can't think why this would lead to the tests failing. Suggestions are welcome. In case someone wants to look, the code is here: https://github.com/djvanderlaan/tinymarkdown Thanks, Jan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Tests run without --as-cran and not with
Thanks! That looks relevant. I think I have found the relevant source code in pandoc, and it seems it just calls Rscript without path. So it will probably call the dummy Rscript. Hmm, I'll have to think how to fix that. It is probably good that R CMD check checks this as this could cause weird errors when people have multiple versions of R on their system. Best, Jan On 30-09-2021 18:59, Ivan Krylov wrote: On Fri, 24 Sep 2021 21:48:12 +0200 Jan van der Laan wrote: my tests run fine when run with R CMD check, but not with R CMD check --as-cran <...> pandoc then calls R again with a script which is part of the package Part of R CMD check --as-cran is placing fake R and Rscript executables on the PATH (but currently not on Windows): https://github.com/r-devel/r-svn/blob/98f33a2a7b22f400d51220162cf400a0cfdc9aaf/src/library/tools/R/check.R#L279 https://github.com/r-devel/r-svn/blob/98f33a2a7b22f400d51220162cf400a0cfdc9aaf/src/library/tools/R/check.R#L6297-L6323 Does the pandoc script use the R_HOME variable to call the correct R executable? __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)
You are not the only one; I did the same with some of my examples. Would it be an option to ask for a default R-option, 'max.ncores', that specifies the maximum number of cores a process is allowed to use? CRAN could then require that that examples, tests and vignettes respect this option. That way there would be one uniform option to specify the maximum number of cores processes could use. That would also make it easier for system administrators to set default values for this (use the entire system; or use one code by default on a shared system). Of course, we package maintainers could do this without involvement of R-code or CRAN. We only need to agree on a name and a default value for when the option is missing (0 = use all cores; 1 or 2; or ncores-1 ...). Jan On 24-10-2023 13:03, Greg Hunt wrote: In my case recently, after an hour or so’s messing about I disabled some tests and example executions to get rid of the offending times. I doubt that i am the only one to do that. On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni wrote: Thanks for the help, I now tried resubmitting with Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but I still get the same note: Examples with CPU time > 2.5 times elapsed time user system elapsed ratio exchange 1.196 0.04 0.159 7.774 Not sure what to try next. Best, Jouni From: Ivan Krylov Sent: Friday, October 20, 2023 16:54 To: Helske, Jouni Cc: r-package-devel@r-project.org Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table) В Thu, 19 Oct 2023 05:57:54 + "Helske, Jouni" пишет: But I just realised that bssm uses Armadillo via RcppArmadillo, which uses OpenMP by default for some elementwise operations. So, I wonder if that could be the culprit? I wasn't able to reproduce the NOTE either, despite manually setting the environment variable _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD check, but I think I can see the code using OpenMP. Here's what I did: 0. Temporarily lower the system protections against capturing performance traces of potentially sensitive parts: echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid (Set it back to 3 after you're done.) 1. Run the following command with the development version of the package installed: env OPENBLAS_NUM_THREADS=1 \ perf record --call-graph drawf,4096 \ R -e 'library(bssm); system.time(replicate(100, example(exchange)))' OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker threads if you have it installed. (A different BLAS may need different environment variables.) 2. Run `perf report` and browse collected call stack information. The call stacks are hard to navigate, but I think they are not pointing towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't help, but setting OMP_THREAD_LIMIT=1 does. -- Best regards, Ivan [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Read and restore .Random.seed in package
Thanks! I noticed that there was an almost identical question asked on this list only a few days ago that I completely missed. Sorry for that. Your example and the examples there at least give me a better way to write my function. Jan On 19-09-2022 11:58, Achim Zeileis wrote: On Mon, 19 Sep 2022, Jan van der Laan wrote: I have a function in which I need to draw some random numbers. However, for some use cases, it is necessary that the same random numbers are drawn (when the input is the same) [1]. So I would like to do a set.seed in my function. This could, however, mess up the seed set by the user. So what I would like to do is store .Random.seed, call set.seed, draw my random numbers and restore .Random.seed to its original value. For an example see the bottom of the mail. Am I allowed on CRAN to read and restore .Random.seed in a package function? This seems to conflict with the "Packages should not modify the global environment (user’s workspace)." policy. Is there another way to get the same random numbers each time a function is called without messing up the seed set by the user? [2]] My understanding is that restoring the .Random.seed is exempt from this policy. See the first lines in stats:::simulate.lm for how R itself deals with this situation: simulate.lm <- function(object, nsim = 1, seed = NULL, ...) { if(!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) runif(1) # initialize the RNG if necessary if(is.null(seed)) RNGstate <- get(".Random.seed", envir = .GlobalEnv) else { R.seed <- get(".Random.seed", envir = .GlobalEnv) set.seed(seed) RNGstate <- structure(seed, kind = as.list(RNGkind())) on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv)) } [...] [1] Records are randomly distributed over cluster nodes. For some use cases it is necessary that the same records end up on the same cluster nodes when the function is called multiple times. [2] A possible solution would be to document that the user should ensure that the same seed is used when calling this function for the use cases where this is needed. set_seed <- function(seed, ...) { if (!exists(".Random.seed")) set.seed(NULL) old_seed <- .Random.seed if (length(seed) > 1) { .Random.seed <<- seed } else { set.seed(seed, ...) } invisible(old_seed) } foo <- function(n) { old_seed <- set_seed(1) on.exit(set_seed(old_seed)) runif(n) } Using these: set.seed(2) foo(5) [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 runif(5) [1] 0.1848823 0.7023740 0.5733263 0.1680519 0.9438393 foo(5) [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 runif(5) [1] 0.9434750 0.1291590 0.8334488 0.4680185 0.5499837 __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] Read and restore .Random.seed in package
I have a function in which I need to draw some random numbers. However, for some use cases, it is necessary that the same random numbers are drawn (when the input is the same) [1]. So I would like to do a set.seed in my function. This could, however, mess up the seed set by the user. So what I would like to do is store .Random.seed, call set.seed, draw my random numbers and restore .Random.seed to its original value. For an example see the bottom of the mail. Am I allowed on CRAN to read and restore .Random.seed in a package function? This seems to conflict with the "Packages should not modify the global environment (user’s workspace)." policy. Is there another way to get the same random numbers each time a function is called without messing up the seed set by the user? [2]] Thanks. Jan [1] Records are randomly distributed over cluster nodes. For some use cases it is necessary that the same records end up on the same cluster nodes when the function is called multiple times. [2] A possible solution would be to document that the user should ensure that the same seed is used when calling this function for the use cases where this is needed. set_seed <- function(seed, ...) { if (!exists(".Random.seed")) set.seed(NULL) old_seed <- .Random.seed if (length(seed) > 1) { .Random.seed <<- seed } else { set.seed(seed, ...) } invisible(old_seed) } foo <- function(n) { old_seed <- set_seed(1) on.exit(set_seed(old_seed)) runif(n) } Using these: > set.seed(2) > foo(5) [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 > runif(5) [1] 0.1848823 0.7023740 0.5733263 0.1680519 0.9438393 > foo(5) [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 > runif(5) [1] 0.9434750 0.1291590 0.8334488 0.4680185 0.5499837 __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] Issue with CPU time > 2.5 times elapsed time
I am trying to upload a new version of the reclin2 package, but it fails the pre-tests on Debian with the following message: > * checking examples ... [14s/4s] NOTE > Examples with CPU time > 2.5 times elapsed time > user system elapsed ratio > select_threshold 3.700 0.122 0.455 8.400 > select_n_to_m4.228 0.180 0.623 7.075 > * checking for unstated dependencies in ‘tests’ ... OK See https://win-builder.r-project.org/incoming_pretest/reclin2_0.3.1_20230716_124651/Debian/00check.log for the complete output. I can't see why this happens and I can't seem to reproduce this on my machine. The examples do call makeCluster from parallel, but only start one thread. The code that is ran in the examples only call base-r functions and data.table functions. I can imagine data.table starting multiple threads. However the example consists only of 17 records; and data.table should not use more than two threads on that system anyway. So I don't see where the large difference between the two is coming from. Does anyone have a clue? The code is here: https://github.com/djvanderlaan/reclin2 one of the examples that fails is here: https://github.com/djvanderlaan/reclin2/blob/master/R/select_threshold.R#L21. Thanks. Jan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Package required but not available: ‘arrow’
This error indicates that the arrow package is unavailable on the system where your package is checked. At https://cran.r-project.org/web/checks/check_results_arrow.html you can see that the arrow package is currently not working with clang on fedora an debian. This is not something that you can fix. All you can do is report this with the arrow maintainers if it is not already reported, and wait until this is fixed. HTH, Jan On 21-02-2024 23:15, Park, Sung Jae wrote: Hi, I’m writing to seek assistance regarding an issue we’re encountering during the submission process of our new package to CRAN. The package in question is currently working smoothly on R CMD check on Windows; however, we are facing a specific error when running R CMD check on Debian. The error message we’ve got from CRAN is as follows: ``` ❯ checking package dependencies ... ERROR Package required but not available: ‘arrow’ See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’ manual. ``` We have ensured that the ‘arrow’ package is properly listed in DESCRIPTION file under the ‘Imports:’. Could you please provide guidance on how to resolve this? Any help will be valuable. Thank you in advance. Best, --Sungjae [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Question regarding finding the source file location without R-packages outside of R-Studio
Can't/don't you use relative paths? library(..., lib.loc = "./MyLibrary") Then your project is perfectly portable. The only thing you need to take care of is to run your code from your project directory. In R-studio this is easily done by using projects. Outside of R-studio it depends on how and with what you are running your code; in general the programme you are using to work with your R-files will know where the R-files are. In the terminal, for example, you can cd to your project dir and work from there. Jan On 23-11-2023 20:39, Tony Wilkes wrote: Hi everyone, I have a question. I hope it's not a stupid question. Suppose you'd want to perform version control and project isolation. You'd create a project folder (let's call it "MyProject"), and place all the R packages you need for that project inside a subfolder (let's say "MyProject/MyLibrary"). Now you create and run an R-script in "MyProject". install.packages(), library(), etc. all have a lib.loc argument to specify the library path. So one can manually specify the path of your project, and then you'd have your project isolation and version control fully set-up. But if I want to set-up the library path automatically, to make it portable, I would need to determine the script location. In RStudio I can use the 'rstudioapi' package, which is very stable, and so does not really require version control. But for outside R-Studio, I have not found a very stable package that also works. I prefer not using external R packages that requires version control (i.e. a package that changes often-ish): you'd need the package to access the project library, but the project library to access the package. This brings me to my actual question: is it possible to determine the source file location of an R script outside of R-Studio, without resorting to R packages ? Or else use an R package that is VERY stable (i.e. doesn't change every (half) a year, like tidyverse packages tend to do)? commandArgs() used to contain the script path (apparently), but it doesn't work for me. By the way: I wish to get the script path in an interactive session. Thank you in advance. Kind regards, Tony [[alternative HTML version deleted]] __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] Order of repo access from options("repos")
Interesting. That would also mean that putting a company repo first does not protect against dependency confusion attacks (people intentionally uploading packages with the same name as company internal packages on CRAN; https://arstechnica.com/information-technology/2021/02/supply-chain-attack-that-fooled-apple-and-microsoft-is-attracting-copycats/) Jan On 01-04-2024 02:07, Greg Hunt wrote: Martin, Dirk, Kevin, Thanks for your help. To summarise: the order of access is undefined, and every repo URL is accessed. I'm working in an environment where "known-good" is more important than "latest", so what follows is an explanation of the problem space from my perspective. What I am experimenting with is pinning down the versions of the packages that a moderately complex solution is built against using a combination of an internal repository of cached packages (internally written packages, our own hopefully transient copies of packages archived from CRAN, packages live on CRAN, and packages present in both Github and CRAN which we build and cache locally) and a proxy that separately populates that cache in specific build processes by intercepting requests to CRAN. I'd like to use the base R function if possible and I want to let the version numbers in the dependencies float because a) we do need to maintain approximate currency in what versions of packages we use and b) I have no business monkeying around with third party's dependencies. Renv looks helpful but has some assumptions about disk access to its cache that I'd rather avoid by running an internal repo. The team is spread around the world, so shared cache volumes are not a great idea. The business with the multiple repo addresses is one approach to working around Docker's inability to understand that people need to access the Docker host's ports from inside a container or a build, and that the current Docker treatment of the host's internal IP is far from transparent (I have scripts that run both inside and outside of Docker containers and they used to be able to work out for themselves what environment they run in, thats got harder lately). That led down a path in which one set of addresses did not reject connection attempts, making each package installation (and there are hundreds) take some number of minutes for the connections to time out. Thankfully I don't actually have to deal with that. We have had a few cases where our dependencies have been archived from CRAN and we have maintained our own copy for a period of days to months, a period in which we do not know what the next package version number is. It would be convenient to not have to think about that - a deterministic, terminating search of a sequence of repos looked like a nice idea for that, but I may have to do something different. There was a recent case where a package made a breaking change in its interface in a release (not version) update that broke another package we depend on. It would be nice to be able to temporarily pin that package at its previous version (without updating the source of the third party package that depends on it) to preserve our own build-ability while those packages sort themselves out. There is one case where a pull request for a CRAN-hosted package was verbally accepted but never actioned so we have our own forked version of a CRAN-hosted package which I need to decide what to do with one day soon. Another case where the package version number is different in CRAN from the one we want. We have a dependency on a package that we build from a Git repo but which is also present in CRAN. I don't want to be dependent on the maintainers keeping the package version in the Git copy of the DESCRIPTION file higher than the version in CRAN. Ideally I'd like to build and push to the internal repo and not have to think about it after that. Same issue as before arises, as it stands today I have to either worry about, and probably edit, the version number in the build or manage the cache population process so the internal package instance is added after any CRAN-sourced dependencies and make sure that the public CRAN instances are not accessed in the build. All of these problems are soluble by special-casing the affected installs, specifically managing the cache population (with a requirement that the cache and CRAN not be searched at the same time), or editing version numbers whose next values I do not control, but I would like to try for the simplest approach first. I know I'm not going to get a clean solution here, the relative weights of "known-good" and "latest" are different depending on where you stand. Greg On Sun, 31 Mar 2024 at 22:43, Martin Morgan wrote: available.packages indicates that By default, the return value includes only packages whose version and OS requirements are met by the running version of R, and only gives information on the latest versions of packages. So all repositories are