Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-14 Thread Kasper Daniel Hansen
Tobias,

When you use the data() command on the data package, you need to do
  library(dummyData)
first (and you therefore need to Suggest: dummyData)

Here is an example from minfi/minfiData

if (require(minfiData)) {
  dat <- preprocessIllumina(RGsetEx, bg.correct=FALSE, normalize="controls")
}

Note how I use require to load the package. For clarity you could argue I
should also have
  data(RGsetEx)
but it is technically not necessary because of lazy loading.





On Thu, Jan 9, 2020 at 4:40 PM Pages, Herve  wrote:

> On 1/9/20 13:00, web working wrote:
> > Hi Herve,
> >
> > thank you for your detailed answer. I guess I have expressed myself
> > unclear. The BED files were just examples for data I store in the
> > inst/extdata folder. Based on the description for ExperimentHubData I
> > have decided to create a software and a data package (no
> > ExperimentHubData software package). In my RData files I store software
> > package objects. These objects are bigger than 5 MB. Using a helper
> > function is no option, because the object calculation takes to much
> > time. For this reason I want to load this objects for my example
> > functions. My question is if the storage of my RData files in the
> > inst/extdata directory is correct or not.
>
> It's technically correct but it's not as convenient as putting them in
> data/ because they can not longer be listed and/or loaded with data().
> So if you're storing them in inst/extdata only because the data()
> solution gave you a BiocCheck warning then I'd say that you're giving up
> too easily ;-)
>
> IMO it is important to try to understand why the data() solution gave
> you a BiocCheck warning in the first place. Unfortunately you're not
> providing enough information for us to be able to tell. What does the
> warning say? How can we reproduce the warning? Ideally we would need to
> see a transcript of your session and links to your packages.
>
> Thanks,
> H.
>
>
> >
> > Best,
> >
> > Tobias
> >
> > Am 09.01.20 um 17:59 schrieb Pages, Herve:
> >> Hi Tobias,
> >>
> >> If the original data is in BED files, there should be no need to
> >> serialize the objects obtained by importing the files. It is **much**
> >> better to provide a small helper function that creates an object from a
> >> BED file and to use that function each time you need to load an object.
> >>
> >> This has at least 2 advantages:
> >> 1. It avoids redundant storage of the data.
> >> 2. By avoiding serialization of high-level S4 objects, it makes the
> >> package easier to maintain in the long run.
> >>
> >> Note that the helper function could also implement a cache mechanism
> >> (easy to do with an environment) so the BED file is only loaded and the
> >> object created the 1st time the function is called. On subsequent calls,
> >> the object is retrieved from the cache.
> >>
> >> However, if the BED files are really big (e.g. > 50 Mb), we require them
> >> to be stored on ExperimentHub instead of inside dummyData. Note that you
> >> still need to provide the dummyData package (which becomes what we call
> >> an ExperimentHub-based data package). See the "Creating An ExperimentHub
> >> Package" vignette in the ExperimentHubData package for more information
> >> about this.
> >>
> >> Hope this helps,
> >>
> >> H.
> >>
> >> On 1/9/20 04:45, web working wrote:
> >>> Dear all,
> >>>
> >>> I am currently developing a software package (dummySoftware) and a data
> >>> package (dummyData) and I am a bit confused in where to store my RData
> >>> files in the data package. Here my situation:
> >>>
> >>> I want to store some software package objects (new class objects of the
> >>> software package) in the data package. This objects are example objects
> >>> and a to big for software packages. As I have read here
> >>> (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o=
> >>>
> >>> ) all RData objects should be stored in the data directory of a
> package.
> >>> BED files of the data package are stored in inst/extdata.
> >>> The data of the data packaged will be addressed in the software package
> >>> like this: system.file('extdata', 'subset.bed', package = 'dummyData').
> >>> And here the problem occurs. After building the data package
> >>> (devtools::build(args = c('--resave-data'))), all data in data/ are
> >>> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
> >>> addressed with system.file. Addressing this data with the data()
> >>> function results in a warning during BiocCheck::BiocCheck().
> >>>
> >>> My solution is to store the RData files in the inst/extdata directory
> >>> and address them with system.file. Something similar is mentioned here,
> >>> but in the context of a vignette
> >>> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
> >>>
> >>> Best,
> 

Re: [Bioc-devel] Question about github repo

2020-01-14 Thread Andris Jankevics
Hi Krutik,


You are looking at the "problem" from a wrong angle. If there are no any 
reasons raised by your PhD funding body to keep your repository closed, it's of 
your own benefit to have all the code public:


a) There is chance that someone else in the world is working on something very 
similar. Having an open repository with full development history  is very good 
way to show publishers, thesis committee, collaborators, your future employers  
or even enemies that you  have been working on the idea for a while;


b) There is chance that instead of writing their own tools on similar approach 
you are developing other researchers will want to contribute to your open 
project. Thus making you package more versatile and competitive.


Best Regards,

Andris


From: Bioc-devel  on behalf of Krutik Patel 
(PGR) 
Sent: 14 January 2020 10:49:49
To: bioc-devel@r-project.org
Subject: [Bioc-devel] Question about github repo

Hi Bioconductor dev team,

I am ready to submit my first package to bioconductor. It is currently on a 
private github and I wish to start the process of submission. Do I need to make 
my github repo public prior to sending it to the bioconductor contributions 
github repo. I understand that the code and concept of my project needs to be 
made public, but I have just had some worries about individuals potentially 
scooping my work while it is made public. This package is part of my PhD so I 
am perhaps a bit paranoid about this. Apologies if this was a bit of a silly 
question. Hope to hear back soon.

Kind Regards,
Krutik Patel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] Question about github repo

2020-01-14 Thread Krutik Patel (PGR)
Hi Bioconductor dev team,

I am ready to submit my first package to bioconductor. It is currently on a 
private github and I wish to start the process of submission. Do I need to make 
my github repo public prior to sending it to the bioconductor contributions 
github repo. I understand that the code and concept of my project needs to be 
made public, but I have just had some worries about individuals potentially 
scooping my work while it is made public. This package is part of my PhD so I 
am perhaps a bit paranoid about this. Apologies if this was a bit of a silly 
question. Hope to hear back soon.

Kind Regards,
Krutik Patel

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel