Re: [R-pkg-devel] suggestion: conda for third-party software
It would also be worth looking at the basilisk package: https://github.com/LTLA/basilisk where the approach used there is to instead embed a Conda installation as part of the R package itself. This comes with the benefit that it's now the package author's responsibility to maintain the Conda installation (not CRAN nor the users), but does have the drawback that installing or upgrading that Conda environment may become more challenging. One other large benefit of this approach is that it forces R package authors who want to use Python through reticulate to standardize on the same environment. Note that reticulate can only bind to a single Python session per R session, so attempting to have R packages which use incompatible Python dependencies could quickly become an issue. (Python packages tend to rely on virtual environments, and so Python packages tend to declare more narrow dependency version requirements.) Hence, having a "standardized" Python environment that can be used by R packages through reticulate (or other Python-wrapping packages) should be very useful. If you're curious, there's a more detailed discussion here: https://github.com/LTLA/basilisk/issues/2 Best, Kevin On Wed, Jan 8, 2020 at 8:34 AM Kevin Ushey wrote: > > On Tue, Jan 7, 2020 at 10:42 PM Sokol Serguei wrote: > > > > Thanks for this hint. > > > > Le 07/01/2020 à 20:47, Kevin Ushey a écrit : > > > The newest version of reticulate does something very similar: R > > > packages can declare their Python package dependencies in the > > > Config/reticulate field of a DESCRIPTION file, and reticulate can read > > > and use those dependencies to provision a Python environment for the > > > user when requested (currently using Miniconda). > > > > If miniconda is used, does it mean that not only Python but any conda > > package can be indicated in dependency ? > > In theory yes, but reticulate only accepts Python package dependencies > since its primary goal is interoperation with Python. > > > And another question, do you know if miniconda is installed on testing > > CRAN machines? (Without this I cannot see how your packages with conda > > dependencies could be tested during their submission.) > > I don't think so. I can't speak for CRAN, but their time is precious > and it seems unlikely to me that they would be willing to expend the > time needed to maintain Conda installations across their fleet of CRAN > machines. > > Packages using Miniconda in this way could still run their tests on > different types of infrastructure, though (e.g. Travis CI). > > > Best, > > > > Serguei. > > > > > > > > Similarly, rather than having this part of SystemRequirements, package > > > authors could declare these in a separate field called e.g. > > > Config/conda. Then, you could have an R package that knows how to read > > > and parse these configuration requests, and install those packages for > > > the user. > > > > > > That said, maintaining a Conda installation and its environments is > > > non-trivial, and things do not always work as expected when mixing > > > Conda applications with non-Conda applications. Most notably, Conda > > > installations bundle their own copies of libraries; e.g. the C++ > > > standard library, Qt, OpenSSL, and so on. If an application tries to > > > mix and match both system-provided and Conda-provided libraries in the > > > same process, bad things often happen. This was still the > > > lowest-friction way forward for us with reticulate, but it's worth > > > being aware that Conda is not a total panacea. > > > > > > Best, > > > Kevin > > > > > > On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol > > > wrote: > > >> Best wishes for 2020! > > >> > > >> I would like to suggest a new feature for R package management. Its aim > > >> is to enable package developers and end-users to rely on conda ( > > >> https://docs.conda.io/en/latest/ ) for managing third-party software > > >> (TPS) on major platforms: linux64, win64 and osx64. Currently, many R > > >> packages include TPS as part of them thus bloating their sizes and often > > >> duplicating files on a given system. And even when TPS is not included > > >> in an R package but is just installed on a system, it is not so obvious > > >> to get the right path to it. Sometimes pkg-config helps but it is not > > >> always present. > > >> > > >> So, the new feature would be to let R package developers to write in > > >> DESCRIPTION/SystemRequirements field something like > > >> 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda > > >> package and '>=1.71' is an optional version requirement. Having this > > >> could allow install.packages() to install TPS on a testing CRAN machine > > >> or on an end-user's one. (There is just one line to execute in a shell: > > >> conda install . It will install the package itself as well as > > >> all its dependencies). > > >> > > >> To my mind, this feature would have the following advantages: > > >>- on-disk size economy
Re: [R-pkg-devel] suggestion: conda for third-party software
On Tue, Jan 7, 2020 at 10:42 PM Sokol Serguei wrote: > > Thanks for this hint. > > Le 07/01/2020 à 20:47, Kevin Ushey a écrit : > > The newest version of reticulate does something very similar: R > > packages can declare their Python package dependencies in the > > Config/reticulate field of a DESCRIPTION file, and reticulate can read > > and use those dependencies to provision a Python environment for the > > user when requested (currently using Miniconda). > > If miniconda is used, does it mean that not only Python but any conda > package can be indicated in dependency ? In theory yes, but reticulate only accepts Python package dependencies since its primary goal is interoperation with Python. > And another question, do you know if miniconda is installed on testing > CRAN machines? (Without this I cannot see how your packages with conda > dependencies could be tested during their submission.) I don't think so. I can't speak for CRAN, but their time is precious and it seems unlikely to me that they would be willing to expend the time needed to maintain Conda installations across their fleet of CRAN machines. Packages using Miniconda in this way could still run their tests on different types of infrastructure, though (e.g. Travis CI). > Best, > > Serguei. > > > > > Similarly, rather than having this part of SystemRequirements, package > > authors could declare these in a separate field called e.g. > > Config/conda. Then, you could have an R package that knows how to read > > and parse these configuration requests, and install those packages for > > the user. > > > > That said, maintaining a Conda installation and its environments is > > non-trivial, and things do not always work as expected when mixing > > Conda applications with non-Conda applications. Most notably, Conda > > installations bundle their own copies of libraries; e.g. the C++ > > standard library, Qt, OpenSSL, and so on. If an application tries to > > mix and match both system-provided and Conda-provided libraries in the > > same process, bad things often happen. This was still the > > lowest-friction way forward for us with reticulate, but it's worth > > being aware that Conda is not a total panacea. > > > > Best, > > Kevin > > > > On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol > > wrote: > >> Best wishes for 2020! > >> > >> I would like to suggest a new feature for R package management. Its aim > >> is to enable package developers and end-users to rely on conda ( > >> https://docs.conda.io/en/latest/ ) for managing third-party software > >> (TPS) on major platforms: linux64, win64 and osx64. Currently, many R > >> packages include TPS as part of them thus bloating their sizes and often > >> duplicating files on a given system. And even when TPS is not included > >> in an R package but is just installed on a system, it is not so obvious > >> to get the right path to it. Sometimes pkg-config helps but it is not > >> always present. > >> > >> So, the new feature would be to let R package developers to write in > >> DESCRIPTION/SystemRequirements field something like > >> 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda > >> package and '>=1.71' is an optional version requirement. Having this > >> could allow install.packages() to install TPS on a testing CRAN machine > >> or on an end-user's one. (There is just one line to execute in a shell: > >> conda install . It will install the package itself as well as > >> all its dependencies). > >> > >> To my mind, this feature would have the following advantages: > >>- on-disk size economy as the same TPS does not have to be included in > >> R package itself and can be shared with other language wrappers, e.g. > >> Python; > >>- an easy flag configuring in Makevars as paths to TPS will be well > >> known in advance; > >>- CRAN machines could test packages relying on a wide panel of TPS > >> without bothering with their manual installation; > >>- TPS installation can become transparent for the end-user on major > >> platforms; > >> > >> Note that even R is part of conda ( > >> https://anaconda.org/conda-forge/r-base ), it is not mandatory to use > >> the conda's R version for this feature. Here, conda is just meant to > >> facilitate access to TPS. However, a minimal requirement is obviously to > >> have conda itself. > >> > >> Does it look reasonable? appealing? > >> Best, > >> Serguei. > >> > >> __ > >> R-package-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > > __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] suggestion: conda for third-party software
Le 08/01/2020 à 08:50, Ivan Krylov a écrit : On Tue, 7 Jan 2020 15:49:45 +0100 Serguei Sokol wrote: Currently, many R packages include TPS as part of them thus bloating their sizes and often duplicating files on a given system. And even when TPS is not included in an R package but is just installed on a system, it is not so obvious to get the right path to it. Sometimes pkg-config helps but it is not always present. I agree that making a package depend on a third-party library means finding oneself in a bit of a pickle. A really popular library like cURL could be "just" depended upon (for the price of some problems when building on Windows). A really small (e.g. 3 source files) and rarely updated (just once last year) library like liborigin could "just" be bundled (but the package maintainer would have to constantly watch out for new versions of the library). Finding that the bundled version of a network-facing library in an R package (e.g. libuv in httpuv) is several minor versions out of date is always a bit scary, even if it turns out that no major security flaws have been found in that version (just a few low-probability resource leaks, one unlikely NULL pointer dereference and some portability problems). The road to dependency hell is paved with intentions of code reuse. So, the new feature would be to let R package developers to write in DESCRIPTION/SystemRequirements field something like 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda package and '>=1.71' is an optional version requirement. While I appreciate the effort behind Anaconda, I would hate to see it being *required* to depend on third-party binaries compiled by a fourth-party (am I counting my parties right?) when there's already a copy installed and available via means the user trusts more (e.g. via GNU/Linux distro package, or Homebrew on macOS, or just a copy sitting in /usr/local installed manually from source). In this regard, a separate field like "Config/conda" suggested by Kevin Ushey sounds like a good idea: if one wants to use Anaconda, the field is there. If one doesn't, one can just ignore it and provide the necessary dependencies in a different way. The same would apply for my proposition: if you want, you use conda:something if not you do like before. But anyway, I don't make a campaign for 'conda:' tag in SystemRequirements. Kevin's Config/conda solution seems to be sufficient for this issue. Just, I was not aware that it was already there. Best, Serguei. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] suggestion: conda for third-party software
On Tue, 7 Jan 2020 15:49:45 +0100 Serguei Sokol wrote: > Currently, many R packages include TPS as part of them thus bloating > their sizes and often duplicating files on a given system. And even > when TPS is not included in an R package but is just installed on a > system, it is not so obvious to get the right path to it. Sometimes > pkg-config helps but it is not always present. I agree that making a package depend on a third-party library means finding oneself in a bit of a pickle. A really popular library like cURL could be "just" depended upon (for the price of some problems when building on Windows). A really small (e.g. 3 source files) and rarely updated (just once last year) library like liborigin could "just" be bundled (but the package maintainer would have to constantly watch out for new versions of the library). Finding that the bundled version of a network-facing library in an R package (e.g. libuv in httpuv) is several minor versions out of date is always a bit scary, even if it turns out that no major security flaws have been found in that version (just a few low-probability resource leaks, one unlikely NULL pointer dereference and some portability problems). The road to dependency hell is paved with intentions of code reuse. > So, the new feature would be to let R package developers to write in > DESCRIPTION/SystemRequirements field something like > 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda > package and '>=1.71' is an optional version requirement. While I appreciate the effort behind Anaconda, I would hate to see it being *required* to depend on third-party binaries compiled by a fourth-party (am I counting my parties right?) when there's already a copy installed and available via means the user trusts more (e.g. via GNU/Linux distro package, or Homebrew on macOS, or just a copy sitting in /usr/local installed manually from source). In this regard, a separate field like "Config/conda" suggested by Kevin Ushey sounds like a good idea: if one wants to use Anaconda, the field is there. If one doesn't, one can just ignore it and provide the necessary dependencies in a different way. -- Best regards, Ivan __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
Re: [R-pkg-devel] suggestion: conda for third-party software
Thanks for this hint. Le 07/01/2020 à 20:47, Kevin Ushey a écrit : The newest version of reticulate does something very similar: R packages can declare their Python package dependencies in the Config/reticulate field of a DESCRIPTION file, and reticulate can read and use those dependencies to provision a Python environment for the user when requested (currently using Miniconda). If miniconda is used, does it mean that not only Python but any conda package can be indicated in dependency ? And another question, do you know if miniconda is installed on testing CRAN machines? (Without this I cannot see how your packages with conda dependencies could be tested during their submission.) Best, Serguei. Similarly, rather than having this part of SystemRequirements, package authors could declare these in a separate field called e.g. Config/conda. Then, you could have an R package that knows how to read and parse these configuration requests, and install those packages for the user. That said, maintaining a Conda installation and its environments is non-trivial, and things do not always work as expected when mixing Conda applications with non-Conda applications. Most notably, Conda installations bundle their own copies of libraries; e.g. the C++ standard library, Qt, OpenSSL, and so on. If an application tries to mix and match both system-provided and Conda-provided libraries in the same process, bad things often happen. This was still the lowest-friction way forward for us with reticulate, but it's worth being aware that Conda is not a total panacea. Best, Kevin On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol wrote: Best wishes for 2020! I would like to suggest a new feature for R package management. Its aim is to enable package developers and end-users to rely on conda ( https://docs.conda.io/en/latest/ ) for managing third-party software (TPS) on major platforms: linux64, win64 and osx64. Currently, many R packages include TPS as part of them thus bloating their sizes and often duplicating files on a given system. And even when TPS is not included in an R package but is just installed on a system, it is not so obvious to get the right path to it. Sometimes pkg-config helps but it is not always present. So, the new feature would be to let R package developers to write in DESCRIPTION/SystemRequirements field something like 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda package and '>=1.71' is an optional version requirement. Having this could allow install.packages() to install TPS on a testing CRAN machine or on an end-user's one. (There is just one line to execute in a shell: conda install . It will install the package itself as well as all its dependencies). To my mind, this feature would have the following advantages: - on-disk size economy as the same TPS does not have to be included in R package itself and can be shared with other language wrappers, e.g. Python; - an easy flag configuring in Makevars as paths to TPS will be well known in advance; - CRAN machines could test packages relying on a wide panel of TPS without bothering with their manual installation; - TPS installation can become transparent for the end-user on major platforms; Note that even R is part of conda ( https://anaconda.org/conda-forge/r-base ), it is not mandatory to use the conda's R version for this feature. Here, conda is just meant to facilitate access to TPS. However, a minimal requirement is obviously to have conda itself. Does it look reasonable? appealing? Best, Serguei. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel
[R-pkg-devel] suggestion: conda for third-party software
Best wishes for 2020! I would like to suggest a new feature for R package management. Its aim is to enable package developers and end-users to rely on conda ( https://docs.conda.io/en/latest/ ) for managing third-party software (TPS) on major platforms: linux64, win64 and osx64. Currently, many R packages include TPS as part of them thus bloating their sizes and often duplicating files on a given system. And even when TPS is not included in an R package but is just installed on a system, it is not so obvious to get the right path to it. Sometimes pkg-config helps but it is not always present. So, the new feature would be to let R package developers to write in DESCRIPTION/SystemRequirements field something like 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda package and '>=1.71' is an optional version requirement. Having this could allow install.packages() to install TPS on a testing CRAN machine or on an end-user's one. (There is just one line to execute in a shell: conda install . It will install the package itself as well as all its dependencies). To my mind, this feature would have the following advantages: - on-disk size economy as the same TPS does not have to be included in R package itself and can be shared with other language wrappers, e.g. Python; - an easy flag configuring in Makevars as paths to TPS will be well known in advance; - CRAN machines could test packages relying on a wide panel of TPS without bothering with their manual installation; - TPS installation can become transparent for the end-user on major platforms; Note that even R is part of conda ( https://anaconda.org/conda-forge/r-base ), it is not mandatory to use the conda's R version for this feature. Here, conda is just meant to facilitate access to TPS. However, a minimal requirement is obviously to have conda itself. Does it look reasonable? appealing? Best, Serguei. __ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel