Not an answer, but a request from someone often working behind firewalls and/or machines not connected to the internet. Please have a way to have the package search for the data at some user specified location such as a local directory.

Best,

Jan



On 14-02-2025 15:54, John Clarke wrote:
Hi folks,

I've looked around for this particular question, but haven't found a good
answer. I have a versioned dataset that includes about 6 csv files that
total about 15MB for each version. The versions get updated every few years
or so and are used to drive the model which was written in C++ but is now
inside an Rcpp wrapper. Apart from the fact that CRAN does not permit large
files, I want to have a better way for users to access particular versions
of the dataset.

Usage idea:
  # The following would hopefully also download default/most recent version
of the csv files from CRAN (if allowed) or Github or some other repository
for academic open source data.
install.packages("MyPackage")
mypackage = new(MyPackage)

Then, if necessary, the user could change the dataset used with something
like:
mypackage.dataset("2.1.0") which would retrieve new csv files if they
haven't already been downloaded and update the data_folder path internally
to point to 2.1.0 directory.

Requirements:
- The dataset is csv (not a R data object) and the Rcpp MyPackage expects
this format
- Would be nice to properly include citations for the data as they will
likely be initially released through a journal publication

What is the best practice for this sort of dataset management for a package
in R? Is it okay to use Github to store and version the data? Or
preferred to use an R package (ignoring the file size limit). Or some other
open source data hosting? I see https://r-universe.dev/ as an option as
well. In any case, what is the proper mechanism for retrieving/caching the
data?

Thanks,

-John

John Clarke | Senior Technical Advisor |
Cornerstone Systems Northwest | john.cla...@cornerstonenw.com

        [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to