Re: [Bioc-devel] BiocFileCache for developers
BiocFileCache has been updated to follow this type of behavior - if location exists use without prompting (default user_cache_dir()) - if doesnt exit - prompt user to create - if respond N or not an interactive session uses temporary directory This is reflected in devel version 1.3.8 Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 From: Michael Love <michaelisaiahl...@gmail.com> Sent: Saturday, December 9, 2017 5:18:08 PM To: Henrik Bengtsson Cc: Shepherd, Lori; bioc-devel@r-project.org Subject: Re: [Bioc-devel] BiocFileCache for developers thanks Henrik, I like the explicitness of the `R.cache` approach and I copied it for my current implementation. For the BiocFileCache location that should be used for this package I'm developing, `tximeta`, I'm now using the following logic: * If run non-interactively, `tximeta` uses a temporary directory. * If run interactively, and a location has not been previously saved, the user is prompted if she wants to use (1) the default directory or a (2) temporary directory. - If (1), then use the default directory, and save this choice. - If (2), then use a temporary directory for the rest of this R session, and ask again next R session. * The prompt above also mentions that a specific function can be used to manually set the directory at any time point, and this choice is saved. * The default directory is given by `rappdirs::user_cache_dir("BiocFileCache")`. * The choice itself of the BiocFileCache directory that `tximeta` should use is saved in a JSON file here `rappdirs::user_cache_dir("tximeta")`. This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
thanks Henrik, I like the explicitness of the `R.cache` approach and I copied it for my current implementation. For the BiocFileCache location that should be used for this package I'm developing, `tximeta`, I'm now using the following logic: * If run non-interactively, `tximeta` uses a temporary directory. * If run interactively, and a location has not been previously saved, the user is prompted if she wants to use (1) the default directory or a (2) temporary directory. - If (1), then use the default directory, and save this choice. - If (2), then use a temporary directory for the rest of this R session, and ask again next R session. * The prompt above also mentions that a specific function can be used to manually set the directory at any time point, and this choice is saved. * The default directory is given by `rappdirs::user_cache_dir("BiocFileCache")`. * The choice itself of the BiocFileCache directory that `tximeta` should use is saved in a JSON file here `rappdirs::user_cache_dir("tximeta")`. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
R.cache (>= 0.6.0) does the following to acquire a persistent cache (root) folder. This behavior was introduced after getting prompted by CRAN not to write to disk by default (because they found "funny" folders on their check servers) and a following email conversation with CRAN (2011-12-29), and getting an "ok with me" from Uwe@CRAN: 1. When loaded (not only attached) it checks for the existence of a cache folder (defaults to ~/.Rcache unless neither an R option nor an env var is set). If it is exists, then we're good to go. 2. If the cache folder does not exist, and in a non-interactive session, then a temporary cache folder specific to that R session is used. 3. If the cache folder does not exist, and in an interactive session, then the user will be queried whether they'd like to create ~/.Rcache (the default choice) or whether they like to use a temporary folder (just as in the non-interactive case). If accepting ~/.Rcache, then that will be available across sessions (Step 1 above). The gist is: Make sure to get the user's approval before storing anything permanently and don't doing anything that surprises the user, risk overwriting their files, etc. Here is a real-world user example on a "fresh" user account: # Non-interactive sessions or user does not approve $ Rscript -e "R.cache::getCacheRootPath()" [1] "/tmp/RtmpzIZT4o/.Rcache" $ R --vanilla > dummy <- loadNamespace("R.cache") The R.cache package needs to create a directory that will hold cache files. It is convenient to use one in the user's home directory, because it remains also after restarting R. Do you wish to create the '~/.Rcache/' directory? If not, a temporary directory (/tmp/RtmpMA4LTF/.Rcache) that is specific to this R session will be used. [Y/n]: n > R.cache::getCacheRootPath() [1] "/tmp/Rtmp0Ic5zQ/.Rcache" > quit("no") $ R --vanilla > R.cache::getCacheRootPath() The R.cache package needs to create a directory that will hold cache files. It is convenient to use one in the user's home directory, because it remains also after restarting R. Do you wish to create the '~/.Rcache/' directory? If not, a temporary directory (/tmp/RtmpzSJd3d/.Rcache) that is specific to this R session will be used. [Y/n]: n [1] "/tmp/RtmpzSJd3d/.Rcache" > quit("no") $ Rscript -e "R.cache::getCacheRootPath()" [1] "/tmp/Rtmpq1nx0H/.Rcache" # User approves or already approved $ R --vanilla > dummy <- loadNamespace("R.cache") The R.cache package needs to create a directory that will hold cache files. It is convenient to use one in the user's home directory, because it remains also after restarting R. Do you wish to create the '~/.Rcache/' directory? If not, a temporary directory (/tmp/RtmpMA4LTF/.Rcache) that is specific to this R session will be used. [Y/n]: Y > R.cache::getCacheRootPath() [1] "~/.Rcache/" > quit("no") $ Rscript -e "R.cache::getCacheRootPath()" [1] "~/.Rcache/" $ R --vanilla > dummy <- loadNamespace("R.cache") > R.cache::getCacheRootPath() [1] "~/.Rcache/" The same applies when using library("R.cache") as well as when the R.cache namespace is imported by another package. This behavior also plays well with 'R CMD check' and 'R CMD check --as-cran' where the cache folder will default to a temporary folder. It will also prevent run-time errors since there will always be a cache folder available (although it'll only survive the current session). R.cache works the same on all OSes. To further lower the risk for "what is this ~/.Rcache folder doing here?", R.cache also adds a ~/.Rcache/README.txt file explaining what that folder is and what created it. About what the default location should be: On Fri, Dec 1, 2017 at 8:06 AM, Sean Daviswrote: [...] > On some systems, the user home directory is not large (such as on HPC > systems) or has strong quotas. The default user_cache_dir may not be the > best choice there. I agree with this but it's hard to find a solid simple alternative to the user's home folder. However, and on my todo list to investigate, https://cran.r-project.org/package=rappdirs may provide a better approach because it follows OS-specific recommendations. Back to writing to user's home folder: in HPC environments with limited home quota, I simply do things like ln -s /scratch/$USER/.Rcache ~/.Rcache. /Henrik On Fri, Dec 1, 2017 at 8:32 AM, Michael Love wrote: > One solution if a developer really wants to make sure the user knows > that the function will store a cache somewhere would be to leave the > BiocFileCache location argument without a default value. > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
user_cache_dir(appname="mikes-package-name") wow, how did you guess it? I'm storing TxDb's for use across sessions with `rname` set to the basename of the GTF file, e.g. "gencode.v27.annotation.gtf.gz". I want to encourage the serendipitous case that there is already a BiocFileCache entry with this `rname` created outside of the use of my package. I can see this happening, especially if I mention this naming pattern in the vignette. I'm thinking I will encourage the user to pick a good BiocFileCache location by not setting a default value. Potentially multiple users could be sharing the same BiocFileCache location, e.g. a lab space on HPC. And then actively specifying NULL for the location (or something like this) could switch the location to: user_cache_dir(appname = "BiocFileCache") ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
On 12/01/2017 11:23 AM, Sean Davis wrote: On Fri, Dec 1, 2017 at 11:16 AM, Shepherd, Lori < lori.sheph...@roswellpark.org> wrote: So having a user argument might be best. Or defining a unique cache location for your package would be another option. The R package development policies actually has a statement that may be helpful in thinking about this. Your mileage may vary in the interpretation - - Packages should not write in the users’ home filespace, nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). Installing into the system’s R installation (e.g., scripts to its bin directory) is not allowed. Limited exceptions may be allowed in interactive sessions if the package obtains confirmation from the user. Actually, CRAN policies. The CRAN policy is definitely appropriate for vignette and example code, and certainly functions by default should not write to locations where they will potentially overwrite existing resources. The policy makes it impossible to write files that persist across sessions, which is the objective for BiocFileCache. For the original question, I think there's often a case for user_cache_dir(appname="mikes-package-name") Martin https://cran.r-project.org/web/packages/policies.html Sean Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 -- *From:* Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Sean Davis <seand...@gmail.com> *Sent:* Friday, December 1, 2017 11:06:39 AM *To:* Michael Love *Cc:* bioc-devel@r-project.org *Subject:* Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahl...@gmail.com wrote: hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location? On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean thanks, Mike ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. This email message may contain legally privileged and/or...{{dropped:2}} ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
Unfortunately I think there are a number of packages that don't necessarily adhere to this. Bioconductor packages we try to always make sure any example or vignette code follows this policy. I think the exception case may be made if it deals with main functionality of package code and if it is noted prominently in the package documentation. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 From: Sean Davis <seand...@gmail.com> Sent: Friday, December 1, 2017 11:23:39 AM To: Shepherd, Lori Cc: Michael Love; bioc-devel@r-project.org Subject: Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 11:16 AM, Shepherd, Lori <lori.sheph...@roswellpark.org<mailto:lori.sheph...@roswellpark.org>> wrote: So having a user argument might be best. Or defining a unique cache location for your package would be another option. The R package development policies actually has a statement that may be helpful in thinking about this. Your mileage may vary in the interpretation * - Packages should not write in the users� home filespace, nor anywhere else on the file system apart from the R session�s temporary directory (or during installation in the location pointed to by TMPDIR: and such usage should be cleaned up). Installing into the system�s R installation (e.g., scripts to its bin directory) is not allowed. Limited exceptions may be allowed in interactive sessions if the package obtains confirmation from the user. https://cran.r-project.org/web/packages/policies.html Sean Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 From: Bioc-devel <bioc-devel-boun...@r-project.org<mailto:bioc-devel-boun...@r-project.org>> on behalf of Sean Davis <seand...@gmail.com<mailto:seand...@gmail.com>> Sent: Friday, December 1, 2017 11:06:39 AM To: Michael Love Cc: bioc-devel@r-project.org<mailto:bioc-devel@r-project.org> Subject: Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahl...@gmail.com<mailto:michaelisaiahl...@gmail.com>> wrote: > hi, > > I'm writing a function which currently uses BiocFileCache to store a > small data.frame and one or more TxDb objects, so that these objects > are persistent and available across sessions (or possible available to > multiple users). > > In the simplest case, I would call > > bfc <- BiocFileCache() > > inside my function, which will check the default location: > > user_cache_dir(appname = "BiocFileCache") > > In general, should developers also support the user specifying a > specific location for the BiocFileCache? So functions using > BiocFileCache should have an argument that overrides the above > location? > On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean > > thanks, > Mike > > ___ > Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have
Re: [Bioc-devel] BiocFileCache for developers
So having a user argument might be best. Or defining a unique cache location for your package would be another option. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Sean Davis <seand...@gmail.com> Sent: Friday, December 1, 2017 11:06:39 AM To: Michael Love Cc: bioc-devel@r-project.org Subject: Re: [Bioc-devel] BiocFileCache for developers On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahl...@gmail.com> wrote: > hi, > > I'm writing a function which currently uses BiocFileCache to store a > small data.frame and one or more TxDb objects, so that these objects > are persistent and available across sessions (or possible available to > multiple users). > > In the simplest case, I would call > > bfc <- BiocFileCache() > > inside my function, which will check the default location: > > user_cache_dir(appname = "BiocFileCache") > > In general, should developers also support the user specifying a > specific location for the BiocFileCache? So functions using > BiocFileCache should have an argument that overrides the above > location? > On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean > > thanks, > Mike > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
One solution if a developer really wants to make sure the user knows that the function will store a cache somewhere would be to leave the BiocFileCache location argument without a default value. ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
On Fri, Dec 1, 2017 at 11:16 AM, Shepherd, Lori < lori.sheph...@roswellpark.org> wrote: > So having a user argument might be best. Or defining a unique cache > location for your package would be another option. > The R package development policies actually has a statement that may be helpful in thinking about this. Your mileage may vary in the interpretation >- - Packages should not write in the users’ home filespace, nor >anywhere else on the file system apart from the R session’s temporary >directory (or during installation in the location pointed to by TMPDIR: >and such usage should be cleaned up). Installing into the system’s R >installation (e.g., scripts to its bin directory) is not allowed. > >Limited exceptions may be allowed in interactive sessions if the >package obtains confirmation from the user. > > https://cran.r-project.org/web/packages/policies.html Sean > > Lori Shepherd > > Bioconductor Core Team > > Roswell Park Cancer Institute > > Department of Biostatistics & Bioinformatics > > Elm & Carlton Streets > > Buffalo, New York 14263 > -- > *From:* Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Sean > Davis <seand...@gmail.com> > *Sent:* Friday, December 1, 2017 11:06:39 AM > *To:* Michael Love > *Cc:* bioc-devel@r-project.org > *Subject:* Re: [Bioc-devel] BiocFileCache for developers > > On Fri, Dec 1, 2017 at 10:28 AM, Michael Love <michaelisaiahl...@gmail.com > > > wrote: > > > hi, > > > > I'm writing a function which currently uses BiocFileCache to store a > > small data.frame and one or more TxDb objects, so that these objects > > are persistent and available across sessions (or possible available to > > multiple users). > > > > In the simplest case, I would call > > > > bfc <- BiocFileCache() > > > > inside my function, which will check the default location: > > > > user_cache_dir(appname = "BiocFileCache") > > > > In general, should developers also support the user specifying a > > specific location for the BiocFileCache? So functions using > > BiocFileCache should have an argument that overrides the above > > location? > > > > On some systems, the user home directory is not large (such as on HPC > systems) or has strong quotas. The default user_cache_dir may not be the > best choice there. > > Sean > > > > > > thanks, > > Mike > > > > ___ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > -- > Sean Davis, MD, PhD > Center for Cancer Research > National Cancer Institute > National Institutes of Health > Bethesda, MD 20892 > https://seandavi.github.io/ > https://twitter.com/seandavis12 > > [[alternative HTML version deleted]] > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
On Fri, Dec 1, 2017 at 10:28 AM, Michael Lovewrote: > hi, > > I'm writing a function which currently uses BiocFileCache to store a > small data.frame and one or more TxDb objects, so that these objects > are persistent and available across sessions (or possible available to > multiple users). > > In the simplest case, I would call > > bfc <- BiocFileCache() > > inside my function, which will check the default location: > > user_cache_dir(appname = "BiocFileCache") > > In general, should developers also support the user specifying a > specific location for the BiocFileCache? So functions using > BiocFileCache should have an argument that overrides the above > location? > On some systems, the user home directory is not large (such as on HPC systems) or has strong quotas. The default user_cache_dir may not be the best choice there. Sean > > thanks, > Mike > > ___ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > -- Sean Davis, MD, PhD Center for Cancer Research National Cancer Institute National Institutes of Health Bethesda, MD 20892 https://seandavi.github.io/ https://twitter.com/seandavis12 [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Re: [Bioc-devel] BiocFileCache for developers
If you are using it as a helper function that may be too much exposure and you may just want it running behind the scenes in default location; but it could be given as an option to the user. I guess a coding preference. If the user specified directory is used, they will have to remember to input that each time they use your package or it will redownload. There shouldn't be a concern of overwriting files in the default cache location, as files added to the cache get a random identifier to try to avoid overwriting and to allow for essentially duplicate entries. You can always get the cache location of a bfc object by calling bfccache(bfc) in case a user specific directory is used. Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 From: Bioc-develon behalf of Michael Love Sent: Friday, December 1, 2017 10:28:48 AM To: bioc-devel@r-project.org Subject: [Bioc-devel] BiocFileCache for developers hi, I'm writing a function which currently uses BiocFileCache to store a small data.frame and one or more TxDb objects, so that these objects are persistent and available across sessions (or possible available to multiple users). In the simplest case, I would call bfc <- BiocFileCache() inside my function, which will check the default location: user_cache_dir(appname = "BiocFileCache") In general, should developers also support the user specifying a specific location for the BiocFileCache? So functions using BiocFileCache should have an argument that overrides the above location? thanks, Mike ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] ___ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel