[R-pkg-devel] Run garbage collector when too many open files

2018-08-07 Thread Jan van der Laan




In my package I open handles to temporary files from c++, handles to 
them are returned to R through vptr objects. The files are deleted then 
the corresponding R-object is deleted and the garbage collector runs:


a <- lvec(10, "integer")
rm(a)

Then when the garbage collector runs the file is deleted. However, on 
some platforms (probably with lower limits on the maximum number of file 
handles a process can have open), I run into the problem that the 
garbage collector doesn't run often enough. In this case that means that 
another package of mine using this package generates an error when its 
tests are run.


The simplest solution is to add some calls to gc() in my tests. But a 
more general/automatic solution would be nice.


I thought about something in the lines of

robust_lvec <- function(...) {
  tryCatch({
lvec(...)
  }, error = function(e) {
gc()
lvec(...) # duplicated code
  })
}

e.g. try to open a file, when that fails call the garbage collector and 
try again. However, this introduces duplicated code (in this case only 
one line, but that can be more), and doesn't help if it is another 
function that tries to open a file.


Is there a better solution?

Thanks!

Jan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Tests run without --as-cran and not with

2021-09-24 Thread Jan van der Laan



I am trying solve an issue where my tests run fine when run with R CMD 
check, but not with R CMD check --as-cran. In the tests pandoc is called 
using system; pandoc then calls R again with a script which is part of 
the package. The last part seems to fail, see error message below:


  Running ‘test_mdweave.R’
 ERROR
Running the tests in ‘tests/test_mdweave.R’ failed.
Last 13 lines of output:
  > message("Weave file")
  Weave file
  > fn <- system.file(file.path("examples", md), package = "tinymarkdown")
  > mdweave(fn)
  Error running filter 
/home/eoos/git/tinymarkdown/work/tinymarkdown.Rcheck/tinymarkdown/scripts/filter.R:

  Filter returned error status 1
  >


As I mentioned, this only happend when testing with --as-cran. Without 
--as-cran the output is:


* checking tests ...
  Running ‘test_file_subs_ext.R’
  Running ‘test_mdtangle.R’
  Running ‘test_mdweave.R’
 OK
* checking for unstated dependencies in vignettes ... OK


Note that I already set the environment variable R_TEST to "" 
(https://github.com/r-lib/testthat/issues/144; I am not using testthat). 
This was also needed to get the check without --as-cran running.


One thing that I notice it that R_LIBS and R_LIBS_SITE between the two 
ways of running R CMD check. However, I can't think why this would lead 
to the tests failing.


Suggestions are welcome.

In case someone wants to look, the code is here:
https://github.com/djvanderlaan/tinymarkdown

Thanks,

Jan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Tests run without --as-cran and not with

2021-10-01 Thread Jan van der Laan

Thanks! That looks relevant.

I think I have found the relevant source code in pandoc, and it seems it 
just calls Rscript without path. So it will probably call the dummy 
Rscript. Hmm, I'll have to think how to fix that.


It is probably good that R CMD check checks this as this could cause 
weird errors when people have multiple versions of R on their system.


Best,
Jan



On 30-09-2021 18:59, Ivan Krylov wrote:

On Fri, 24 Sep 2021 21:48:12 +0200
Jan van der Laan  wrote:


my tests run fine when run with R CMD check, but not with R CMD check
--as-cran


<...>


pandoc then calls R again with a script which is part of the package


Part of R CMD check --as-cran is placing fake R and Rscript executables
on the PATH (but currently not on Windows):

https://github.com/r-devel/r-svn/blob/98f33a2a7b22f400d51220162cf400a0cfdc9aaf/src/library/tools/R/check.R#L279

https://github.com/r-devel/r-svn/blob/98f33a2a7b22f400d51220162cf400a0cfdc9aaf/src/library/tools/R/check.R#L6297-L6323

Does the pandoc script use the R_HOME variable to call the correct R
executable?



__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Too many cores used in examples (not caused by data.table)

2023-10-24 Thread Jan van der Laan

You are not the only one; I did the same with some of my examples.

Would it be an option to ask for a default R-option, 'max.ncores', that 
specifies the maximum number of cores a process is allowed to use? CRAN 
could then require that that examples, tests and vignettes respect this 
option. That way there would be one uniform option to specify the 
maximum number of cores processes could use. That would also make it 
easier for system administrators to set default values for this (use the 
entire system; or use one code by default on a shared system).


Of course, we package maintainers could do this without involvement of 
R-code or CRAN. We only need to agree on a name and a default value for 
when the option is missing (0 = use all cores; 1 or 2; or ncores-1 ...).


Jan


On 24-10-2023 13:03, Greg Hunt wrote:

In my case recently, after an hour or so’s messing about I disabled some
tests and example executions to get rid of the offending times. I doubt
that i am the only one to do that.

On Tue, 24 Oct 2023 at 9:38 pm, Helske, Jouni  wrote:


Thanks for the help, I now tried resubmitting with
Sys.setenv("OMP_THREAD_LIMIT" = 2) at the top of the exchange example, but
I still get the same note:

Examples with CPU time > 2.5 times elapsed time
   user system elapsed ratio
exchange 1.196   0.04   0.159 7.774

Not sure what to try next.

Best,
Jouni

From: Ivan Krylov 
Sent: Friday, October 20, 2023 16:54
To: Helske, Jouni 
Cc: r-package-devel@r-project.org 
Subject: Re: [R-pkg-devel] Too many cores used in examples (not caused by
data.table)

В Thu, 19 Oct 2023 05:57:54 +
"Helske, Jouni"  пишет:


But I just realised that bssm uses Armadillo via RcppArmadillo, which
uses OpenMP by default for some elementwise operations. So, I wonder
if that could be the culprit?


I wasn't able to reproduce the NOTE either, despite manually setting
the environment variable
_R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD=2 before running R CMD
check, but I think I can see the code using OpenMP. Here's what I did:

0. Temporarily lower the system protections against capturing
performance traces of potentially sensitive parts:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

(Set it back to 3 after you're done.)

1. Run the following command with the development version of the
package installed:

env OPENBLAS_NUM_THREADS=1 \
  perf record --call-graph drawf,4096 \
  R -e 'library(bssm); system.time(replicate(100, example(exchange)))'

OPENBLAS_NUM_THREADS=1 will prevent OpenBLAS from spawning worker
threads if you have it installed. (A different BLAS may need different
environment variables.)

2. Run `perf report` and browse collected call stack information.

The call stacks are hard to navigate, but I think they are not pointing
towards Armadillo. At least, setting ARMA_OPENMP_THREADS=1 doesn't
help, but setting OMP_THREAD_LIMIT=1 does.

--
Best regards,
Ivan

 [[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Read and restore .Random.seed in package

2022-09-19 Thread Jan van der Laan

Thanks!

I noticed that there was an almost identical question asked on this list 
only a few days ago that I completely missed. Sorry for that. Your 
example and the examples there at least give me a better way to write my 
function.


Jan


On 19-09-2022 11:58, Achim Zeileis wrote:

On Mon, 19 Sep 2022, Jan van der Laan wrote:



I have a function in which I need to draw some random numbers. 
However, for some use cases, it is necessary that the same random 
numbers are drawn (when the input is the same) [1]. So I would like to 
do a set.seed in my function. This could, however, mess up the seed 
set by the user. So what I would like to do is store .Random.seed, 
call set.seed, draw my random numbers and restore .Random.seed to its 
original value. For an example see the bottom of the mail.


Am I allowed on CRAN to read and restore .Random.seed in a package 
function? This seems to conflict with the "Packages should not modify 
the global environment (user’s workspace)." policy. Is there another 
way to get the same random numbers each time a function is called 
without messing up the seed set by the user? [2]]


My understanding is that restoring the .Random.seed is exempt from this 
policy. See the first lines in stats:::simulate.lm for how R itself 
deals with this situation:


simulate.lm <- function(object, nsim = 1, seed = NULL, ...)
{
     if(!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
     runif(1) # initialize the RNG if necessary
     if(is.null(seed))
     RNGstate <- get(".Random.seed", envir = .GlobalEnv)
     else {
     R.seed <- get(".Random.seed", envir = .GlobalEnv)
 set.seed(seed)
     RNGstate <- structure(seed, kind = as.list(RNGkind()))
     on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv))
     }

[...]



[1] Records are randomly distributed over cluster nodes. For some use 
cases it is necessary that the same records end up on the same cluster 
nodes when the function is called multiple times.


[2] A possible solution would be to document that the user should 
ensure that the same seed is used when calling this function for the 
use cases where this is needed.



set_seed <- function(seed, ...) {
 if (!exists(".Random.seed")) set.seed(NULL)
 old_seed <- .Random.seed
 if (length(seed) > 1) {
   .Random.seed <<- seed
 } else {
   set.seed(seed, ...)
 }
 invisible(old_seed)
}

foo <- function(n) {
 old_seed <- set_seed(1)
 on.exit(set_seed(old_seed))
 runif(n)
}

Using these:


set.seed(2)
foo(5)

[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819

runif(5)

[1] 0.1848823 0.7023740 0.5733263 0.1680519 0.9438393

foo(5)

[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819

runif(5)

[1] 0.9434750 0.1291590 0.8334488 0.4680185 0.5499837

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel




__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Read and restore .Random.seed in package

2022-09-19 Thread Jan van der Laan



I have a function in which I need to draw some random numbers. However, 
for some use cases, it is necessary that the same random numbers are 
drawn (when the input is the same) [1]. So I would like to do a set.seed 
in my function. This could, however, mess up the seed set by the user. 
So what I would like to do is store .Random.seed, call set.seed, draw my 
random numbers and restore .Random.seed to its original value. For an 
example see the bottom of the mail.


Am I allowed on CRAN to read and restore .Random.seed in a package 
function? This seems to conflict with the "Packages should not modify 
the global environment (user’s workspace)." policy. Is there another way 
to get the same random numbers each time a function is called without 
messing up the seed set by the user? [2]]


Thanks.
Jan

[1] Records are randomly distributed over cluster nodes. For some use 
cases it is necessary that the same records end up on the same cluster 
nodes when the function is called multiple times.


[2] A possible solution would be to document that the user should ensure 
that the same seed is used when calling this function for the use cases 
where this is needed.



set_seed <- function(seed, ...) {
  if (!exists(".Random.seed")) set.seed(NULL)
  old_seed <- .Random.seed
  if (length(seed) > 1) {
.Random.seed <<- seed
  } else {
set.seed(seed, ...)
  }
  invisible(old_seed)
}

foo <- function(n) {
  old_seed <- set_seed(1)
  on.exit(set_seed(old_seed))
  runif(n)
}

Using these:

> set.seed(2)
> foo(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
> runif(5)
[1] 0.1848823 0.7023740 0.5733263 0.1680519 0.9438393
> foo(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
> runif(5)
[1] 0.9434750 0.1291590 0.8334488 0.4680185 0.5499837

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Issue with CPU time > 2.5 times elapsed time

2023-07-16 Thread Jan van der Laan



I am trying to upload a new version of the reclin2 package, but it fails 
the pre-tests on Debian with the following message:


> * checking examples ... [14s/4s] NOTE
> Examples with CPU time > 2.5 times elapsed time
>   user system elapsed ratio
> select_threshold 3.700  0.122   0.455 8.400
> select_n_to_m4.228  0.180   0.623 7.075
> * checking for unstated dependencies in ‘tests’ ... OK

See
https://win-builder.r-project.org/incoming_pretest/reclin2_0.3.1_20230716_124651/Debian/00check.log 
for the complete output.


I can't see why this happens and I can't seem to reproduce this on my 
machine. The examples do call makeCluster from parallel, but only start 
one thread. The code that is ran in the examples only call base-r 
functions and data.table functions. I can imagine data.table starting 
multiple threads. However the example consists only of 17 records; and 
data.table should not use more than two threads on that system anyway. 
So I don't see where the large difference between the two is coming 
from. Does anyone have a clue?


The code is here: https://github.com/djvanderlaan/reclin2 one of the 
examples that fails is here: 
https://github.com/djvanderlaan/reclin2/blob/master/R/select_threshold.R#L21. 



Thanks.
Jan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Package required but not available: ‘arrow’

2024-02-22 Thread Jan van der Laan
This error indicates that the arrow package is unavailable on the system 
where your package is checked. At 
https://cran.r-project.org/web/checks/check_results_arrow.html you can 
see that the arrow package is currently not working with clang on fedora 
an debian. This is not something that you can fix. All you can do is 
report this with the arrow maintainers if it is not already reported, 
and wait until this is fixed.


HTH,
Jan


On 21-02-2024 23:15, Park, Sung Jae wrote:

Hi,

I’m writing to seek assistance regarding an issue we’re encountering during the 
submission process of our new package to CRAN.
The package in question is currently working smoothly on R CMD check on 
Windows; however, we are facing a specific error when running R CMD check on 
Debian. The error message we’ve got from CRAN is as follows:

```
❯ checking package dependencies ... ERROR
   Package required but not available: ‘arrow’

   See section ‘The DESCRIPTION file’ in the ‘Writing R Extensions’
manual.
```

We have ensured that the ‘arrow’ package is properly listed in DESCRIPTION file 
under the ‘Imports:’.
Could you please provide guidance on how to resolve this? Any help will be 
valuable.

Thank you in advance.

Best,
--Sungjae



[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Question regarding finding the source file location without R-packages outside of R-Studio

2023-11-23 Thread Jan van der Laan



Can't/don't you use relative paths?

library(..., lib.loc = "./MyLibrary")

Then your project is perfectly portable. The only thing you need to take 
care of is to run your code from your project directory. In R-studio 
this is easily done by using projects. Outside of R-studio it depends on 
how and with what you are running your code; in general the programme 
you are using to work with your R-files will know where the R-files are. 
In the terminal, for example, you can cd to your project dir and work 
from there.


Jan

On 23-11-2023 20:39, Tony Wilkes wrote:

Hi everyone,

I have a question. I hope it's not a stupid question.

Suppose you'd want to perform version control and project isolation. You'd create a project folder (let's 
call it "MyProject"), and place all the R packages you need for that project inside a subfolder 
(let's say "MyProject/MyLibrary"). Now you create and run an R-script in "MyProject".
install.packages(), library(), etc. all have a lib.loc argument to specify the 
library path. So one can manually specify the path of your project, and then 
you'd have your project isolation and version control fully set-up.

But if I want to set-up the library path automatically, to make it portable, I 
would need to determine the script location. In RStudio I can use the 
'rstudioapi' package, which is very stable, and so does not really require 
version control. But for outside R-Studio, I have not found a very stable 
package that also works.
I prefer not using external R packages that requires version control (i.e. a 
package that changes often-ish): you'd need the package to access the project 
library, but the project library to access the package.

This brings me to my actual question: is it possible to determine the source 
file location of an R script outside of R-Studio, without resorting to R 
packages ? Or else use an R package that is VERY stable (i.e. doesn't change 
every (half) a year, like tidyverse packages tend to do)? commandArgs() used to 
contain the script path (apparently), but it doesn't work for me.

By the way: I wish to get the script path in an interactive session.

Thank you in advance.

Kind regards,

Tony


[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Order of repo access from options("repos")

2024-04-02 Thread Jan van der Laan
Interesting. That would also mean that putting a company repo first does 
not protect against dependency confusion attacks (people intentionally 
uploading packages with the same name as company internal packages on 
CRAN; 
https://arstechnica.com/information-technology/2021/02/supply-chain-attack-that-fooled-apple-and-microsoft-is-attracting-copycats/) 



Jan



On 01-04-2024 02:07, Greg Hunt wrote:

Martin, Dirk, Kevin,
Thanks for your help.  To summarise: the order of access is undefined, and
every repo URL is accessed.   I'm working in an environment
where "known-good" is more important than "latest", so what follows is an
explanation of the problem space from my perspective.

What I am experimenting with is pinning down the versions of the packages
that a moderately complex solution is built against using a combination of
an internal repository of cached packages (internally written packages, our
own hopefully transient copies of packages archived from CRAN,
packages live on CRAN, and packages present in both Github and CRAN which
we build and cache locally) and a proxy that separately populates that
cache in specific build processes by intercepting requests to CRAN.  I'd
like to use the base R function if possible and I want to let the version
numbers in the dependencies float because a) we do need to maintain
approximate currency in what versions of packages we use and b) I have no
business monkeying around with third party's dependencies.  Renv looks
helpful but has some assumptions about disk access to its cache that I'd
rather avoid by running an internal repo.  The team is spread around the
world, so shared cache volumes are not a great idea.

The business with the multiple repo addresses is one approach to working
around Docker's inability to understand that people need to access the
Docker host's ports from inside a container or a build, and that the
current Docker treatment of the host's internal IP is far from transparent
(I have scripts that run both inside and outside of Docker containers and
they used to be able to work out for themselves what environment they run
in, thats got harder lately).  That led down a path in which one set of
addresses did not reject connection attempts, making each package
installation (and there are hundreds) take some number of minutes for the
connections to time out.  Thankfully I don't actually have to deal with
that.

We have had a few cases where our dependencies have been archived from CRAN
and we have maintained our own copy for a period of days to months, a
period in which we do not know what the next package version number is.  It
would be convenient to not have to think about that - a deterministic,
terminating search of a sequence of repos looked like a nice idea for that,
but I may have to do something different.

There was a recent case where a package made a breaking change in its
interface in a release (not version) update that broke another package we
depend on.  It would be nice to be able to temporarily pin that package at
its previous version (without updating the source of the third party
package that depends on it) to preserve our own build-ability while those
packages sort themselves out.

There is one case where a pull request for a CRAN-hosted package was
verbally accepted but never actioned so we have our own forked version of a
CRAN-hosted package which I need to decide what to do with one day soon.
Another case where the package version number is different in CRAN from the
one we want.

We have a dependency on a package that we build from a Git repo but which
is also present in CRAN.  I don't want to be dependent on the maintainers
keeping the package version in the Git copy of the DESCRIPTION file higher
than the version in CRAN.  Ideally I'd like to build and push to the
internal repo and not have to think about it after that. Same issue as
before arises, as it stands today I have to either worry about, and
probably edit, the version number in the build or manage the cache
population process so the internal package instance is added after any
CRAN-sourced dependencies and make sure that the public CRAN instances are
not accessed in the build.

All of these problems are soluble by special-casing the affected installs,
specifically managing the cache population (with a requirement that the
cache and CRAN not be searched at the same time), or editing version
numbers whose next values I do not control, but I would like to try for the
simplest approach first. I know I'm not going to get a clean solution here,
the relative weights of "known-good" and "latest" are different
depending on where you stand.


Greg

On Sun, 31 Mar 2024 at 22:43, Martin Morgan  wrote:


available.packages indicates that



  By default, the return value includes only packages whose version

  and OS requirements are met by the running version of R, and only

  gives information on the latest versions of packages.



So all repositories are