On 3/26/25 17:47, Sergio Oller wrote:
Hello,

I would like to submit a patch to R. Following 5  Submitting Feature
Requests – R Development Guide
<https://contributor.r-project.org/rdevguide/chapters/submitting_feature_requests.html>,
I would like to ask for feedback before proceeding with a ¿formal?
submission on bugzilla. It's my first attempt contributing to R and I do
not currently have a bugzilla account.

I am working at a company, and we use R with databricks. We want to install
some packages on a distributed filesystem that is not fully POSIX
compliant, as it does not support opening files in append mode. In C terms,
`open(filename, "a")` gives an error. I guess other distributed file
systems beyond the ones in databricks may have issues with append mode as
well.

Our current workaround is to install all packages on a local folder, and
then copy/move the folder to the distributed file system.

This is something we try to keep working in R if possible, to allow users moving installed packages by moving the installation directories. If this practice works for you, it is probably fine.

Currently, installing a binary package just means unpacking it to the target directory. Probably you could do this also  via binary packages: build binary packages on a local filesystem, and then install them to the non-POSIX filesystem (provided the unpacking/installation would work on such a filesystem). If the installation of a binary package doesn't work but could be (possibly optionally) made work, that might be of interest.

If I understand package installation correctly, when a package is
installed, the installation happens inside a 00LOCK directory, and then the
outcome is moved to the final destination.

The contribution I would like to submit allows users/sysadmins to set an
environment variable named PKG_LOCKDIR_PREFIX, that defines the location
where the "00LOCK-" directories are created. The patch is backwards
compatible and it consists of +28,-10 lines, hopefully easy enough to
review.

https://github.com/r-devel/r-svn/pull/196.diff

When I use this patch, I can successfully install packages on a distributed
file system by setting PKG_LOCKDIR_PREFIX to a directory in my local
filesystem (R does all the file append stuff in the local file system, and
finally copies all the package files to the distributed file system)

I am not excited about the idea combining this with the locking mechanism and staged installation in the described way. The current implementation takes advantage of that on a single filesystem, a move operation is either atomic (POSIX) or at least very fast (Windows). Copying an installed package to a different filesystem isn't. There is a risk that some other R session could see a partial installation of a package. Then, if the library was on a distributed filesystem accessed from different machines, there could even be corruption due to concurrent installation from multiple machines. In principle, this could be even on a single machine (checking existence of a directory on one filesystem and creating it on another wouldn't be atomic).

Perhaps the staging/locking could be implemented in some special way on the target filesystem, some second-level staging and installation - but it is questionable whether it is worth the effort/maintenance in base R. Also keep in mind this could hardly be regularly tested as such filesystems are rare.

Best
Tomas

P.S.

about staged installation: https://developer.r-project.org/Blog/public/2019/02/14/staged-install/index.html



This setting makes package installation transparent for all data
scientists, since they may not even know that PKG_LOCKDIR_PREFIX has been
set. Package installation just works as expected.

I feel the patch has some added value over our workaround: Even if we
implement the workaround with a simple wrapper over install.packages(), any
third party package that depends on install.packages() (such as renv or
others) won't use our workaround. Besides, with this patch merged any other
R user benefits from being able to install packages in those filesystems.

Any feedback is very much appreciated.

Thanks for your time,

Sergio

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to