On 3/26/25 17:47, Sergio Oller wrote:
Hello,
I would like to submit a patch to R. Following 5 Submitting Feature
Requests – R Development Guide
<https://contributor.r-project.org/rdevguide/chapters/submitting_feature_requests.html>,
I would like to ask for feedback before proceeding with a ¿formal?
submission on bugzilla. It's my first attempt contributing to R and I do
not currently have a bugzilla account.
I am working at a company, and we use R with databricks. We want to install
some packages on a distributed filesystem that is not fully POSIX
compliant, as it does not support opening files in append mode. In C terms,
`open(filename, "a")` gives an error. I guess other distributed file
systems beyond the ones in databricks may have issues with append mode as
well.
Our current workaround is to install all packages on a local folder, and
then copy/move the folder to the distributed file system.
This is something we try to keep working in R if possible, to allow
users moving installed packages by moving the installation directories.
If this practice works for you, it is probably fine.
Currently, installing a binary package just means unpacking it to the
target directory. Probably you could do this also via binary packages:
build binary packages on a local filesystem, and then install them to
the non-POSIX filesystem (provided the unpacking/installation would work
on such a filesystem). If the installation of a binary package doesn't
work but could be (possibly optionally) made work, that might be of
interest.
If I understand package installation correctly, when a package is
installed, the installation happens inside a 00LOCK directory, and then the
outcome is moved to the final destination.
The contribution I would like to submit allows users/sysadmins to set an
environment variable named PKG_LOCKDIR_PREFIX, that defines the location
where the "00LOCK-" directories are created. The patch is backwards
compatible and it consists of +28,-10 lines, hopefully easy enough to
review.
https://github.com/r-devel/r-svn/pull/196.diff
When I use this patch, I can successfully install packages on a distributed
file system by setting PKG_LOCKDIR_PREFIX to a directory in my local
filesystem (R does all the file append stuff in the local file system, and
finally copies all the package files to the distributed file system)
I am not excited about the idea combining this with the locking
mechanism and staged installation in the described way. The current
implementation takes advantage of that on a single filesystem, a move
operation is either atomic (POSIX) or at least very fast (Windows).
Copying an installed package to a different filesystem isn't. There is a
risk that some other R session could see a partial installation of a
package. Then, if the library was on a distributed filesystem accessed
from different machines, there could even be corruption due to
concurrent installation from multiple machines. In principle, this could
be even on a single machine (checking existence of a directory on one
filesystem and creating it on another wouldn't be atomic).
Perhaps the staging/locking could be implemented in some special way on
the target filesystem, some second-level staging and installation - but
it is questionable whether it is worth the effort/maintenance in base R.
Also keep in mind this could hardly be regularly tested as such
filesystems are rare.
Best
Tomas
P.S.
about staged installation:
https://developer.r-project.org/Blog/public/2019/02/14/staged-install/index.html
This setting makes package installation transparent for all data
scientists, since they may not even know that PKG_LOCKDIR_PREFIX has been
set. Package installation just works as expected.
I feel the patch has some added value over our workaround: Even if we
implement the workaround with a simple wrapper over install.packages(), any
third party package that depends on install.packages() (such as renv or
others) won't use our workaround. Besides, with this patch merged any other
R user benefits from being able to install packages in those filesystems.
Any feedback is very much appreciated.
Thanks for your time,
Sergio
[[alternative HTML version deleted]]
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel