Re: [R-pkg-devel] suggestion: conda for third-party software

2020-01-08 Thread Kevin Ushey
It would also be worth looking at the basilisk package:

https://github.com/LTLA/basilisk

where the approach used there is to instead embed a Conda installation
as part of the R package itself. This comes with the benefit that it's
now the package author's responsibility to maintain the Conda
installation (not CRAN nor the users), but does have the drawback that
installing or upgrading that Conda environment may become more
challenging.

One other large benefit of this approach is that it forces R package
authors who want to use Python through reticulate to standardize on
the same environment. Note that reticulate can only bind to a single
Python session per R session, so attempting to have R packages which
use incompatible Python dependencies could quickly become an issue.
(Python packages tend to rely on virtual environments, and so Python
packages tend to declare more narrow dependency version requirements.)
Hence, having a "standardized" Python environment that can be used by
R packages through reticulate (or other Python-wrapping packages)
should be very useful.

If you're curious, there's a more detailed discussion here:

https://github.com/LTLA/basilisk/issues/2

Best,
Kevin

On Wed, Jan 8, 2020 at 8:34 AM Kevin Ushey  wrote:
>
> On Tue, Jan 7, 2020 at 10:42 PM Sokol Serguei  wrote:
> >
> > Thanks for this hint.
> >
> > Le 07/01/2020 à 20:47, Kevin Ushey a écrit :
> > > The newest version of reticulate does something very similar: R
> > > packages can declare their Python package dependencies in the
> > > Config/reticulate field of a DESCRIPTION file, and reticulate can read
> > > and use those dependencies to provision a Python environment for the
> > > user when requested (currently using Miniconda).
> >
> > If miniconda is used, does it mean that not only Python but any conda
> > package can be indicated in dependency ?
>
> In theory yes, but reticulate only accepts Python package dependencies
> since its primary goal is interoperation with Python.
>
> > And another question, do you know if miniconda is installed on testing
> > CRAN machines? (Without this I cannot see how your packages with conda
> > dependencies could be tested during their submission.)
>
> I don't think so. I can't speak for CRAN, but their time is precious
> and it seems unlikely to me that they would be willing to expend the
> time needed to maintain Conda installations across their fleet of CRAN
> machines.
>
> Packages using Miniconda in this way could still run their tests on
> different types of infrastructure, though (e.g. Travis CI).
>
> > Best,
> >
> > Serguei.
> >
> > >
> > > Similarly, rather than having this part of SystemRequirements, package
> > > authors could declare these in a separate field called e.g.
> > > Config/conda. Then, you could have an R package that knows how to read
> > > and parse these configuration requests, and install those packages for
> > > the user.
> > >
> > > That said, maintaining a Conda installation and its environments is
> > > non-trivial, and things do not always work as expected when mixing
> > > Conda applications with non-Conda applications. Most notably, Conda
> > > installations bundle their own copies of libraries; e.g. the C++
> > > standard library, Qt, OpenSSL, and so on. If an application tries to
> > > mix and match both system-provided and Conda-provided libraries in the
> > > same process, bad things often happen. This was still the
> > > lowest-friction way forward for us with reticulate, but it's worth
> > > being aware that Conda is not a total panacea.
> > >
> > > Best,
> > > Kevin
> > >
> > > On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol  
> > > wrote:
> > >> Best wishes for 2020!
> > >>
> > >> I would like to suggest a new feature for R package management. Its aim
> > >> is to enable package developers and end-users to rely on conda (
> > >> https://docs.conda.io/en/latest/ ) for managing third-party software
> > >> (TPS) on major platforms: linux64, win64 and osx64. Currently, many R
> > >> packages include TPS as part of them thus bloating their sizes and often
> > >> duplicating files on a given system.  And even when TPS is not included
> > >> in an R package but is just installed on a system, it is not so obvious
> > >> to get the right path to it. Sometimes pkg-config helps but it is not
> > >> always present.
> > >>
> > >> So, the new feature would be to let R package developers to write in
> > >> DESCRIPTION/SystemRequirements field something like
> > >> 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda
> > >> package and '>=1.71' is an optional version requirement. Having this
> > >> could allow install.packages() to install TPS on a testing CRAN machine
> > >> or on an end-user's one. (There is just one line to execute in a shell:
> > >> conda install . It will install the package itself as well as
> > >> all its dependencies).
> > >>
> > >> To my mind, this feature would have the following advantages:
> > >>- on-disk size economy 

Re: [R-pkg-devel] suggestion: conda for third-party software

2020-01-08 Thread Kevin Ushey
On Tue, Jan 7, 2020 at 10:42 PM Sokol Serguei  wrote:
>
> Thanks for this hint.
>
> Le 07/01/2020 à 20:47, Kevin Ushey a écrit :
> > The newest version of reticulate does something very similar: R
> > packages can declare their Python package dependencies in the
> > Config/reticulate field of a DESCRIPTION file, and reticulate can read
> > and use those dependencies to provision a Python environment for the
> > user when requested (currently using Miniconda).
>
> If miniconda is used, does it mean that not only Python but any conda
> package can be indicated in dependency ?

In theory yes, but reticulate only accepts Python package dependencies
since its primary goal is interoperation with Python.

> And another question, do you know if miniconda is installed on testing
> CRAN machines? (Without this I cannot see how your packages with conda
> dependencies could be tested during their submission.)

I don't think so. I can't speak for CRAN, but their time is precious
and it seems unlikely to me that they would be willing to expend the
time needed to maintain Conda installations across their fleet of CRAN
machines.

Packages using Miniconda in this way could still run their tests on
different types of infrastructure, though (e.g. Travis CI).

> Best,
>
> Serguei.
>
> >
> > Similarly, rather than having this part of SystemRequirements, package
> > authors could declare these in a separate field called e.g.
> > Config/conda. Then, you could have an R package that knows how to read
> > and parse these configuration requests, and install those packages for
> > the user.
> >
> > That said, maintaining a Conda installation and its environments is
> > non-trivial, and things do not always work as expected when mixing
> > Conda applications with non-Conda applications. Most notably, Conda
> > installations bundle their own copies of libraries; e.g. the C++
> > standard library, Qt, OpenSSL, and so on. If an application tries to
> > mix and match both system-provided and Conda-provided libraries in the
> > same process, bad things often happen. This was still the
> > lowest-friction way forward for us with reticulate, but it's worth
> > being aware that Conda is not a total panacea.
> >
> > Best,
> > Kevin
> >
> > On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol  
> > wrote:
> >> Best wishes for 2020!
> >>
> >> I would like to suggest a new feature for R package management. Its aim
> >> is to enable package developers and end-users to rely on conda (
> >> https://docs.conda.io/en/latest/ ) for managing third-party software
> >> (TPS) on major platforms: linux64, win64 and osx64. Currently, many R
> >> packages include TPS as part of them thus bloating their sizes and often
> >> duplicating files on a given system.  And even when TPS is not included
> >> in an R package but is just installed on a system, it is not so obvious
> >> to get the right path to it. Sometimes pkg-config helps but it is not
> >> always present.
> >>
> >> So, the new feature would be to let R package developers to write in
> >> DESCRIPTION/SystemRequirements field something like
> >> 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda
> >> package and '>=1.71' is an optional version requirement. Having this
> >> could allow install.packages() to install TPS on a testing CRAN machine
> >> or on an end-user's one. (There is just one line to execute in a shell:
> >> conda install . It will install the package itself as well as
> >> all its dependencies).
> >>
> >> To my mind, this feature would have the following advantages:
> >>- on-disk size economy as the same TPS does not have to be included in
> >> R package itself and can be shared with other language wrappers, e.g.
> >> Python;
> >>- an easy flag configuring in Makevars as paths to TPS will be well
> >> known in advance;
> >>- CRAN machines could test packages relying on a wide panel of TPS
> >> without bothering with their manual installation;
> >>- TPS installation can become transparent for the end-user on major
> >> platforms;
> >>
> >> Note that even R is part of conda (
> >> https://anaconda.org/conda-forge/r-base ), it is not mandatory to use
> >> the conda's R version for this feature. Here, conda is just meant to
> >> facilitate access to TPS. However, a minimal requirement is obviously to
> >> have conda itself.
> >>
> >> Does it look reasonable? appealing?
> >> Best,
> >> Serguei.
> >>
> >> __
> >> R-package-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
>

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] suggestion: conda for third-party software

2020-01-08 Thread Serguei Sokol

Le 08/01/2020 à 08:50, Ivan Krylov a écrit :

On Tue, 7 Jan 2020 15:49:45 +0100
Serguei Sokol  wrote:


Currently, many R packages include TPS as part of them thus bloating
their sizes and often duplicating files on a given system.  And even
when TPS is not included in an R package but is just installed on a
system, it is not so obvious to get the right path to it. Sometimes
pkg-config helps but it is not always present.


I agree that making a package depend on a third-party library means
finding oneself in a bit of a pickle. A really popular library like
cURL could be "just" depended upon (for the price of some problems when
building on Windows). A really small (e.g. 3 source files) and rarely
updated (just once last year) library like liborigin could "just" be
bundled (but the package maintainer would have to constantly watch out
for new versions of the library). Finding that the bundled version of a
network-facing library in an R package (e.g. libuv in httpuv) is several
minor versions out of date is always a bit scary, even if it turns out
that no major security flaws have been found in that version (just a few
low-probability resource leaks, one unlikely NULL pointer dereference
and some portability problems). The road to dependency hell is paved
with intentions of code reuse.


So, the new feature would be to let R package developers to write in
DESCRIPTION/SystemRequirements field something like
'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda
package and '>=1.71' is an optional version requirement.


While I appreciate the effort behind Anaconda, I would hate to see it
being *required* to depend on third-party binaries compiled by a
fourth-party (am I counting my parties right?) when there's already a
copy installed and available via means the user trusts more (e.g. via
GNU/Linux distro package, or Homebrew on macOS, or just a copy sitting
in /usr/local installed manually from source). In this regard, a
separate field like "Config/conda" suggested by Kevin Ushey sounds like
a good idea: if one wants to use Anaconda, the field is there. If one
doesn't, one can just ignore it and provide the necessary dependencies
in a different way.
The same would apply for my proposition: if you want, you use 
conda:something if not you do like before. But anyway, I don't make a 
campaign for 'conda:' tag in SystemRequirements. Kevin's Config/conda 
solution seems to be sufficient for this issue. Just, I was not aware 
that it was already there.


Best,
Serguei.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] suggestion: conda for third-party software

2020-01-07 Thread Ivan Krylov
On Tue, 7 Jan 2020 15:49:45 +0100
Serguei Sokol  wrote:

> Currently, many R packages include TPS as part of them thus bloating
> their sizes and often duplicating files on a given system.  And even
> when TPS is not included in an R package but is just installed on a
> system, it is not so obvious to get the right path to it. Sometimes
> pkg-config helps but it is not always present.

I agree that making a package depend on a third-party library means
finding oneself in a bit of a pickle. A really popular library like
cURL could be "just" depended upon (for the price of some problems when
building on Windows). A really small (e.g. 3 source files) and rarely
updated (just once last year) library like liborigin could "just" be
bundled (but the package maintainer would have to constantly watch out
for new versions of the library). Finding that the bundled version of a
network-facing library in an R package (e.g. libuv in httpuv) is several
minor versions out of date is always a bit scary, even if it turns out
that no major security flaws have been found in that version (just a few
low-probability resource leaks, one unlikely NULL pointer dereference
and some portability problems). The road to dependency hell is paved
with intentions of code reuse.

> So, the new feature would be to let R package developers to write in 
> DESCRIPTION/SystemRequirements field something like 
> 'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda 
> package and '>=1.71' is an optional version requirement.

While I appreciate the effort behind Anaconda, I would hate to see it
being *required* to depend on third-party binaries compiled by a
fourth-party (am I counting my parties right?) when there's already a
copy installed and available via means the user trusts more (e.g. via
GNU/Linux distro package, or Homebrew on macOS, or just a copy sitting
in /usr/local installed manually from source). In this regard, a
separate field like "Config/conda" suggested by Kevin Ushey sounds like
a good idea: if one wants to use Anaconda, the field is there. If one
doesn't, one can just ignore it and provide the necessary dependencies
in a different way.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] suggestion: conda for third-party software

2020-01-07 Thread Sokol Serguei

Thanks for this hint.

Le 07/01/2020 à 20:47, Kevin Ushey a écrit :

The newest version of reticulate does something very similar: R
packages can declare their Python package dependencies in the
Config/reticulate field of a DESCRIPTION file, and reticulate can read
and use those dependencies to provision a Python environment for the
user when requested (currently using Miniconda).


If miniconda is used, does it mean that not only Python but any conda 
package can be indicated in dependency ?


And another question, do you know if miniconda is installed on testing 
CRAN machines? (Without this I cannot see how your packages with conda 
dependencies could be tested during their submission.)


Best,

Serguei.



Similarly, rather than having this part of SystemRequirements, package
authors could declare these in a separate field called e.g.
Config/conda. Then, you could have an R package that knows how to read
and parse these configuration requests, and install those packages for
the user.

That said, maintaining a Conda installation and its environments is
non-trivial, and things do not always work as expected when mixing
Conda applications with non-Conda applications. Most notably, Conda
installations bundle their own copies of libraries; e.g. the C++
standard library, Qt, OpenSSL, and so on. If an application tries to
mix and match both system-provided and Conda-provided libraries in the
same process, bad things often happen. This was still the
lowest-friction way forward for us with reticulate, but it's worth
being aware that Conda is not a total panacea.

Best,
Kevin

On Tue, Jan 7, 2020 at 6:50 AM Serguei Sokol  wrote:

Best wishes for 2020!

I would like to suggest a new feature for R package management. Its aim
is to enable package developers and end-users to rely on conda (
https://docs.conda.io/en/latest/ ) for managing third-party software
(TPS) on major platforms: linux64, win64 and osx64. Currently, many R
packages include TPS as part of them thus bloating their sizes and often
duplicating files on a given system.  And even when TPS is not included
in an R package but is just installed on a system, it is not so obvious
to get the right path to it. Sometimes pkg-config helps but it is not
always present.

So, the new feature would be to let R package developers to write in
DESCRIPTION/SystemRequirements field something like
'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda
package and '>=1.71' is an optional version requirement. Having this
could allow install.packages() to install TPS on a testing CRAN machine
or on an end-user's one. (There is just one line to execute in a shell:
conda install . It will install the package itself as well as
all its dependencies).

To my mind, this feature would have the following advantages:
   - on-disk size economy as the same TPS does not have to be included in
R package itself and can be shared with other language wrappers, e.g.
Python;
   - an easy flag configuring in Makevars as paths to TPS will be well
known in advance;
   - CRAN machines could test packages relying on a wide panel of TPS
without bothering with their manual installation;
   - TPS installation can become transparent for the end-user on major
platforms;

Note that even R is part of conda (
https://anaconda.org/conda-forge/r-base ), it is not mandatory to use
the conda's R version for this feature. Here, conda is just meant to
facilitate access to TPS. However, a minimal requirement is obviously to
have conda itself.

Does it look reasonable? appealing?
Best,
Serguei.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] suggestion: conda for third-party software

2020-01-07 Thread Serguei Sokol

Best wishes for 2020!

I would like to suggest a new feature for R package management. Its aim 
is to enable package developers and end-users to rely on conda ( 
https://docs.conda.io/en/latest/ ) for managing third-party software 
(TPS) on major platforms: linux64, win64 and osx64. Currently, many R 
packages include TPS as part of them thus bloating their sizes and often 
duplicating files on a given system.  And even when TPS is not included 
in an R package but is just installed on a system, it is not so obvious 
to get the right path to it. Sometimes pkg-config helps but it is not 
always present.


So, the new feature would be to let R package developers to write in 
DESCRIPTION/SystemRequirements field something like 
'conda:boost-cpp>=1.71' where 'boost-cpp' is an example of a conda 
package and '>=1.71' is an optional version requirement. Having this 
could allow install.packages() to install TPS on a testing CRAN machine 
or on an end-user's one. (There is just one line to execute in a shell: 
conda install . It will install the package itself as well as 
all its dependencies).


To my mind, this feature would have the following advantages:
 - on-disk size economy as the same TPS does not have to be included in 
R package itself and can be shared with other language wrappers, e.g. 
Python;
 - an easy flag configuring in Makevars as paths to TPS will be well 
known in advance;
 - CRAN machines could test packages relying on a wide panel of TPS 
without bothering with their manual installation;
 - TPS installation can become transparent for the end-user on major 
platforms;


Note that even R is part of conda ( 
https://anaconda.org/conda-forge/r-base ), it is not mandatory to use 
the conda's R version for this feature. Here, conda is just meant to 
facilitate access to TPS. However, a minimal requirement is obviously to 
have conda itself.


Does it look reasonable? appealing?
Best,
Serguei.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel