Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Greg Hunt
Martin, Dirk, Kevin,
Thanks for your help.  To summarise: the order of access is undefined, and
every repo URL is accessed.   I'm working in an environment
where "known-good" is more important than "latest", so what follows is an
explanation of the problem space from my perspective.

What I am experimenting with is pinning down the versions of the packages
that a moderately complex solution is built against using a combination of
an internal repository of cached packages (internally written packages, our
own hopefully transient copies of packages archived from CRAN,
packages live on CRAN, and packages present in both Github and CRAN which
we build and cache locally) and a proxy that separately populates that
cache in specific build processes by intercepting requests to CRAN.  I'd
like to use the base R function if possible and I want to let the version
numbers in the dependencies float because a) we do need to maintain
approximate currency in what versions of packages we use and b) I have no
business monkeying around with third party's dependencies.  Renv looks
helpful but has some assumptions about disk access to its cache that I'd
rather avoid by running an internal repo.  The team is spread around the
world, so shared cache volumes are not a great idea.

The business with the multiple repo addresses is one approach to working
around Docker's inability to understand that people need to access the
Docker host's ports from inside a container or a build, and that the
current Docker treatment of the host's internal IP is far from transparent
(I have scripts that run both inside and outside of Docker containers and
they used to be able to work out for themselves what environment they run
in, thats got harder lately).  That led down a path in which one set of
addresses did not reject connection attempts, making each package
installation (and there are hundreds) take some number of minutes for the
connections to time out.  Thankfully I don't actually have to deal with
that.

We have had a few cases where our dependencies have been archived from CRAN
and we have maintained our own copy for a period of days to months, a
period in which we do not know what the next package version number is.  It
would be convenient to not have to think about that - a deterministic,
terminating search of a sequence of repos looked like a nice idea for that,
but I may have to do something different.

There was a recent case where a package made a breaking change in its
interface in a release (not version) update that broke another package we
depend on.  It would be nice to be able to temporarily pin that package at
its previous version (without updating the source of the third party
package that depends on it) to preserve our own build-ability while those
packages sort themselves out.

There is one case where a pull request for a CRAN-hosted package was
verbally accepted but never actioned so we have our own forked version of a
CRAN-hosted package which I need to decide what to do with one day soon.
Another case where the package version number is different in CRAN from the
one we want.

We have a dependency on a package that we build from a Git repo but which
is also present in CRAN.  I don't want to be dependent on the maintainers
keeping the package version in the Git copy of the DESCRIPTION file higher
than the version in CRAN.  Ideally I'd like to build and push to the
internal repo and not have to think about it after that. Same issue as
before arises, as it stands today I have to either worry about, and
probably edit, the version number in the build or manage the cache
population process so the internal package instance is added after any
CRAN-sourced dependencies and make sure that the public CRAN instances are
not accessed in the build.

All of these problems are soluble by special-casing the affected installs,
specifically managing the cache population (with a requirement that the
cache and CRAN not be searched at the same time), or editing version
numbers whose next values I do not control, but I would like to try for the
simplest approach first. I know I'm not going to get a clean solution here,
the relative weights of "known-good" and "latest" are different
depending on where you stand.


Greg

On Sun, 31 Mar 2024 at 22:43, Martin Morgan  wrote:

> available.packages indicates that
>
>
>
>  By default, the return value includes only packages whose version
>
>  and OS requirements are met by the running version of R, and only
>
>  gives information on the latest versions of packages.
>
>
>
> So all repositories are consulted and then the result filtered to contain
> just the most recent version of each. Does it matter then what order the
> repositories are visited?
>
>
>
> Martin Morgan
>
>
>
> *From: *R-package-devel  on behalf
> of Greg Hunt 
> *Date: *Sunday, March 31, 2024 at 7:35 AM
> *To: *Dirk Eddelbuettel 
> *Cc: *List r-package-devel 
> *Subject: *Re: [R-pkg-devel] Order of repo access from 

Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Kevin Ushey
It may also be useful to use:

options(internet.info = 1)

to get more information on the web requests R is making. (See the
documentation in ?options for more details.)

Looking at the source code in available.packages, R does iterate
through the repositories in the same order they're provided, so I'd
suspect some kind of other issue. (Output somehow getting misordered
in your build logs? Repository options being unexpectedly reordered in
your CI build somewhere?)

FWIW, I think the order that repositories are visited could matter if
the same package is offered by multiple repositories -- the selected
repository could depend on the order of declaration.

Best,
Kevin

On Sun, Mar 31, 2024 at 4:55 AM Dirk Eddelbuettel  wrote:
>
>
> On 31 March 2024 at 11:43, Martin Morgan wrote:
> | So all repositories are consulted and then the result filtered to contain 
> just
> | the most recent version of each. Does it matter then what order the
> | repositories are visited?
>
> Right. I fall for that too often, as I did here.  The order matters for
> .libPaths() where the first match is use, for package install the highest
> number (from any entry in getOption(repos)) wins.
>
> Thanks for catching my thinko.
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-package-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Dirk Eddelbuettel


On 31 March 2024 at 11:43, Martin Morgan wrote:
| So all repositories are consulted and then the result filtered to contain just
| the most recent version of each. Does it matter then what order the
| repositories are visited?

Right. I fall for that too often, as I did here.  The order matters for
.libPaths() where the first match is use, for package install the highest
number (from any entry in getOption(repos)) wins.

Thanks for catching my thinko.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Martin Morgan
available.packages indicates that

 By default, the return value includes only packages whose version
 and OS requirements are met by the running version of R, and only
 gives information on the latest versions of packages.

So all repositories are consulted and then the result filtered to contain just 
the most recent version of each. Does it matter then what order the 
repositories are visited?

Martin Morgan

From: R-package-devel  on behalf of Greg 
Hunt 
Date: Sunday, March 31, 2024 at 7:35 AM
To: Dirk Eddelbuettel 
Cc: List r-package-devel 
Subject: Re: [R-pkg-devel] Order of repo access from options("repos")
Dirk,
Sadly I can't use localhost for all of those.  172.17.0.1 is an internal
Docker IP, not the localhost address (127.0.0.1), they are there to handle
two different scenarios and different ones will fail to resolve in
different scenarios.  Are you saying that the DNS lookup adds a timing
issue to the search order?  Isn't the list deterministically ordered?


Greg

On Sun, 31 Mar 2024 at 22:15, Dirk Eddelbuettel  wrote:

>
> Greg,
>
> There are AFAICT two issues here: how R unrolls the named vector that is
> the
> 'repos' element in the list 'options', and how your computer resolves DNS
> for
> localhost vs 172.17.0.1.  I would try something like
>
>options(repos = c(CRAN = "http://localhost:3001/proxy;,
>  C = "http://localhost:3002;,
>  B = "http://localhost:3003/proxy;,
>  A = "http://localhost:3004;))
>
> or the equivalent with 172.17.0.1. When I do that here I get errors from
> first to last as we expect:
>
>> options(repos = c(CRAN = "http://localhost:3001/proxy;,
>  C = "http://localhost:3002;,
>  B = "http://localhost:3003/proxy;,
>  A = "http://localhost:3004;))
>> available.packages()
>Warning: unable to access index for repository
> http://localhost:3001/proxy/src/contrib:
>  cannot open URL 'http://localhost:3001/proxy/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3002/src/contrib:
>  cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3003/proxy/src/contrib:
>  cannot open URL 'http://localhost:3003/proxy/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3004/src/contrib:
>  cannot open URL 'http://localhost:3004/src/contrib/PACKAGES'
> Package Version Priority Depends Imports LinkingTo Suggests
> Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum
> NeedsCompilation File Repository
>>
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Greg Hunt
Dirk,
Sadly I can't use localhost for all of those.  172.17.0.1 is an internal
Docker IP, not the localhost address (127.0.0.1), they are there to handle
two different scenarios and different ones will fail to resolve in
different scenarios.  Are you saying that the DNS lookup adds a timing
issue to the search order?  Isn't the list deterministically ordered?


Greg

On Sun, 31 Mar 2024 at 22:15, Dirk Eddelbuettel  wrote:

>
> Greg,
>
> There are AFAICT two issues here: how R unrolls the named vector that is
> the
> 'repos' element in the list 'options', and how your computer resolves DNS
> for
> localhost vs 172.17.0.1.  I would try something like
>
>options(repos = c(CRAN = "http://localhost:3001/proxy;,
>  C = "http://localhost:3002;,
>  B = "http://localhost:3003/proxy;,
>  A = "http://localhost:3004;))
>
> or the equivalent with 172.17.0.1. When I do that here I get errors from
> first to last as we expect:
>
>> options(repos = c(CRAN = "http://localhost:3001/proxy;,
>  C = "http://localhost:3002;,
>  B = "http://localhost:3003/proxy;,
>  A = "http://localhost:3004;))
>> available.packages()
>Warning: unable to access index for repository
> http://localhost:3001/proxy/src/contrib:
>  cannot open URL 'http://localhost:3001/proxy/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3002/src/contrib:
>  cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3003/proxy/src/contrib:
>  cannot open URL 'http://localhost:3003/proxy/src/contrib/PACKAGES'
>Warning: unable to access index for repository
> http://localhost:3004/src/contrib:
>  cannot open URL 'http://localhost:3004/src/contrib/PACKAGES'
> Package Version Priority Depends Imports LinkingTo Suggests
> Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum
> NeedsCompilation File Repository
>>
>
> Dirk
>
> --
> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Dirk Eddelbuettel


Greg,

There are AFAICT two issues here: how R unrolls the named vector that is the
'repos' element in the list 'options', and how your computer resolves DNS for
localhost vs 172.17.0.1.  I would try something like

   options(repos = c(CRAN = "http://localhost:3001/proxy;,
 C = "http://localhost:3002;,
 B = "http://localhost:3003/proxy;,
 A = "http://localhost:3004;))

or the equivalent with 172.17.0.1. When I do that here I get errors from
first to last as we expect:

   > options(repos = c(CRAN = "http://localhost:3001/proxy;,
 C = "http://localhost:3002;,
 B = "http://localhost:3003/proxy;,
 A = "http://localhost:3004;))
   > available.packages()
   Warning: unable to access index for repository 
http://localhost:3001/proxy/src/contrib:
 cannot open URL 'http://localhost:3001/proxy/src/contrib/PACKAGES'
   Warning: unable to access index for repository 
http://localhost:3002/src/contrib:
 cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
   Warning: unable to access index for repository 
http://localhost:3003/proxy/src/contrib:
 cannot open URL 'http://localhost:3003/proxy/src/contrib/PACKAGES'
   Warning: unable to access index for repository 
http://localhost:3004/src/contrib:
 cannot open URL 'http://localhost:3004/src/contrib/PACKAGES'
Package Version Priority Depends Imports LinkingTo Suggests Enhances 
License License_is_FOSS License_restricts_use OS_type Archs MD5sum 
NeedsCompilation File Repository
   > 

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Order of repo access from options("repos")

2024-03-31 Thread Greg Hunt
When I set multiple repositories in options(repos=...) the order of access
is providing me with some surprises as I work through some CICD issues:

Given:

options(
   repos = c(
 CRAN = "http://localhost:3001/proxy;,
 C = "http://172.17.0.1:3002;,
 B = "http://172.17.0.1:3001/proxy;,
 A = "http://localhost:3002;
   )
)


the order in the build log after this is :

#12 178.7 Warning: unable to access index for repository
http://localhost:3001/proxy/src/contrib:
#12 178.7   cannot open URL '
http://localhost:3001/proxy/src/contrib/PACKAGES'
#12 178.7 Warning: unable to access index for repository
http://172.17.0.1:3002/src/contrib:
#12 178.7   cannot open URL 'http://172.17.0.1:3002/src/contrib/PACKAGES'
#12 178.9 Warning: unable to access index for repository
http://localhost:3002/src/contrib:
#12 178.9   cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
#12 179.0 trying URL '
http://172.17.0.1:3001/proxy/src/contrib/png_0.1-8.tar.gz'
#12 179.1 Content type 'application/x-gzip' length 24880 bytes (24 KB)


Which indicates that the order is:

CRAN, C, A, B...

note that A comes before B in the URL accesses when I was expecting either
CRAN, C, B, A if its is physical order, or alphabetically would be A, B, C,
CRAN.

As an alternative, given:

options(
repos = c(
C = "http://172.17.0.1:3002;,
B = "http://172.17.0.1:3001/proxy;,
A = "http://localhost:3002;,
CRAN = "http://localhost:3001/proxy;
)
)


The order is:

#12 0.485 Warning: unable to access index for repository
http://172.17.0.1:3002/src/contrib:
#12 0.485   cannot open URL 'http://172.17.0.1:3002/src/contrib/PACKAGES'
#12 1.153 Warning: unable to access index for repository
http://localhost:3002/src/contrib:
#12 1.153   cannot open URL 'http://localhost:3002/src/contrib/PACKAGES'
#12 1.153 Warning: unable to access index for repository
http://localhost:3001/proxy/src/contrib:
#12 1.153   cannot open URL '
http://localhost:3001/proxy/src/contrib/PACKAGES'
#12 1.250 trying URL '
http://172.17.0.1:3001/proxy/src/contrib/rlang_1.1.3.tar.gz'


Which seems to be C, A, CRAN, B.

What is it about B?

The help doesn't talk about this.  It says:

repos:
character vector of repository URLs for use by available.packages and
related functions. Initially set from entries marked as default in the
‘repositories’ file, whose path is configurable via environment variable
R_REPOSITORIES (set this to NULL to skip initialization at startup). The
‘factory-fresh’ setting from the file in R.home("etc") is c(CRAN="@CRAN@"),
a value that causes some utilities to prompt for a CRAN mirror. To avoid
this do set the CRAN mirror, by something like


local({
r <- getOption("repos")
r["CRAN"] <- "https://my.local.cran;
options(repos = r)
})
in your ‘.Rprofile’, or use a personal ‘repositories’ file.


Note that you can add more repositories (Bioconductor, R-Forge, RForge.net,
...) for the current session using setRepositories.


Now I am not setting the values in exactly the way that the manual says, so
I experimented in case something was wrong there:

 options('repos')$repos
 CRAN
"https://cloud.r-project.org;
> local({+ r <- getOption("repos")+ r["CRAN"] <- 
> "https://my.local.cran"+ options(repos = r)+ })> options('repos')$repos
   CRAN
"https://my.local.cran;
> str(options('repos'))List of 1
 $ repos: Named chr "https://my.local.cran;
  ..- attr(*, "names")= chr "CRAN"> local({+ r <-
getOption("repos")+ r["CRAN"] <- "https://my.local.cran"+
options(repos = r)+ })> options(+ repos = c(+ C =
"http://172.17.0.1:3002",+ B =
"http://172.17.0.1:3001/proxy",+ A = "http://localhost:3002",+
CRAN = "http://localhost:3001/proxy"+ )+ )>
options('repos')$repos
 C  B
A   CRAN
  "http://172.17.0.1:3002; "http://172.17.0.1:3001/proxy;
"http://localhost:3002;  "http://localhost:3001/proxy;
> str(options('repos'))List of 1
 $ repos: Named chr [1:4] "http://172.17.0.1:3002;
"http://172.17.0.1:3001/proxy; "http://localhost:3002;
"http://localhost:3001/proxy;
  ..- attr(*, "names")= chr [1:4] "C" "B" "A" "CRAN"> local({+ r
<- getOption("repos")+ r["CRAN"] <- "https://my.local.cran"+
r["C"] = "http://172.17.0.1:3002"+ r["B"] =
"http://172.17.0.1:3001/proxy"+ r["A"] = "http://localhost:3002"+
   r["CRAN"] = "http://localhost:3001/proxy"+ options(repos = r)+
})> > str(options('repos'))List of 1
 $ repos: Named chr [1:4] "http://172.17.0.1:3002;
"http://172.17.0.1:3001/proxy; "http://localhost:3002;
"http://localhost:3001/proxy;
  ..- attr(*, "names")= chr [1:4] "C" "B" "A" "CRAN"> options('repos')$repos
 C  B
A   CRAN
  "http://172.17.0.1:3002; "http://172.17.0.1:3001/proxy;
"http://localhost:3002;  "http://localhost:3001/proxy;


So I