Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-16 Thread web working

Hi Herve,

thank you for your answer. To be honest I am fine if the data sets can 
not be loaded with data() solution.


Falsely I said a warning occurred during the BiocCheck. A warning 
occurred during the R check command of my data package. I used the 
recommended R CMD check environment flags for the check of my package 
(https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#r-cmd-check-environment) 
(devtools::check(document = FALSE, args = c('--no-build-vignettes'), 
build_args = c('--resave-data','--no-build-vignettes'))).


During the "Building" step of the R CMD check some "strange" behavior 
occurs. A lot of characters are printed at the screen:


...

─  checking for empty or unneeded directories

─  looking to see if a ‘data/datalist’ file should be added
 dispersionFunction
 fitType
 varLogDispEsts
 dispPriorVar
   dispFunction
   fit
   d
   means
   disps
   minDisp
   
 means
   class
   class
   class
   class
 rowRanges
 unlistData
 elementMetadata
 elementType
 metadata
 partitioning
 class
 colData
 rownames
 nrows
 listData
   [1]
   [2]
 elementType

...

At the "Checking" step I got a warning with the same effect:

...

* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... WARNING

[3]
[4]
[5]
  
    [6]
  
  names
  [2]
  [3]
  [4]
  [5]
    names
    dispersionFunction
    fitType
    varLogDispEsts
    dispPriorVar
    [1]
  [1]
    [1]
    [2]
  names
  class
  row.names
  [2]
    
  means
class

...

This warning only occurs if I store my RData files in the data directory.

Here an example of the Class object I store in the RData file:

#' @rdname dummyDataSet
setClass("dummyDataSet", slots = c(dds = "list",
   genes = "GenomicRanges",
   bamFiles = "list",
   resultTables = "list",
   treatment = "character",
   nameAnalysis = "character",
   numberOfCores = "numeric"),
 validity = function(object) {
   ...
 }, contains = "dummySoftware")

My class exists of some lists of data.frames (e.g. resultTables), lists 
of S4 objects (e.g. list of DESeq2 objects (dds)), S4 objects (e.g. 
genes) and more. Maybe this is the reason why I got this "strange" behavior?


If you need some more information, just let me know.

Here my sessionInfo:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /home/x/Programme/R_Versions/R-3.6.1/lib/libRblas.so
LAPACK: /home/x/Programme/R_Versions/R-3.6.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=de_DE.UTF-8   LC_NUMERIC=C LC_TIME=de_DE.UTF-8    
LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
 [6] LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8 
LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C

[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

other attached packages:
[1] dummyData_0.1.0

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1    yaml_2.2.0

Best,

Tobias


Am 09.01.20 um 22:40 schrieb Pages, Herve:

On 1/9/20 13:00, web working wrote:

Hi Herve,

thank you for your detailed answer. I guess I have expressed myself
unclear. The BED files were just examples for data I store in the
inst/extdata folder. Based on the description for ExperimentHubData I
have decided to create a software and a data package (no
ExperimentHubData software package). In my RData files I store software
package objects. These objects are bigger than 5 MB. Using a helper
function is no option, because the object calculation takes to much
time. For this reason I want to load this objects for my example
functions. My question is if the storage of my RData files in the
inst/extdata directory is correct or not.

It's technically correct but it's not as convenient as putting them in
data/ because they can not longer be listed and/or loaded with data().
So if you're storing them in inst/extdata only because the data()
solution gave you a BiocCheck warning then I'd say that you're giving up
too easily ;-)

IMO it is important to try to understand why the data() solution gave
you a BiocCheck warning in the first place. Unfortunately you're not
providing enough information for us to be able to tell. What does the
warning say? How can we reproduce the warning? Ideally we would need to
see a transcript of your session and links to your packages.

Thanks,
H.



Best,

Tobias

Am 09.01.20 um 17:59 schrieb Pages, Herve:

Hi Tobias,

If the original data is in BED files, there should be no need to
serialize the 

Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-16 Thread web working
Thank you for your example Kasper. The require option seems to be an 
option for me. I am following the Bioconductor "Circular Dependencies" 
Guidelines 
(https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#submitting-related-packages) 
to implement my software and my data package and using the "Suggest" and 
"Depends" connection.


Am 15.01.20 um 00:12 schrieb Kasper Daniel Hansen:

Tobias,

When you use the data() command on the data package, you need to do
   library(dummyData)
first (and you therefore need to Suggest: dummyData)

Here is an example from minfi/minfiData

if (require(minfiData)) {
   dat <- preprocessIllumina(RGsetEx, bg.correct=FALSE, normalize="controls")
}

Note how I use require to load the package. For clarity you could argue I
should also have
   data(RGsetEx)
but it is technically not necessary because of lazy loading.





On Thu, Jan 9, 2020 at 4:40 PM Pages, Herve  wrote:


On 1/9/20 13:00, web working wrote:

Hi Herve,

thank you for your detailed answer. I guess I have expressed myself
unclear. The BED files were just examples for data I store in the
inst/extdata folder. Based on the description for ExperimentHubData I
have decided to create a software and a data package (no
ExperimentHubData software package). In my RData files I store software
package objects. These objects are bigger than 5 MB. Using a helper
function is no option, because the object calculation takes to much
time. For this reason I want to load this objects for my example
functions. My question is if the storage of my RData files in the
inst/extdata directory is correct or not.

It's technically correct but it's not as convenient as putting them in
data/ because they can not longer be listed and/or loaded with data().
So if you're storing them in inst/extdata only because the data()
solution gave you a BiocCheck warning then I'd say that you're giving up
too easily ;-)

IMO it is important to try to understand why the data() solution gave
you a BiocCheck warning in the first place. Unfortunately you're not
providing enough information for us to be able to tell. What does the
warning say? How can we reproduce the warning? Ideally we would need to
see a transcript of your session and links to your packages.

Thanks,
H.



Best,

Tobias

Am 09.01.20 um 17:59 schrieb Pages, Herve:

Hi Tobias,

If the original data is in BED files, there should be no need to
serialize the objects obtained by importing the files. It is **much**
better to provide a small helper function that creates an object from a
BED file and to use that function each time you need to load an object.

This has at least 2 advantages:
1. It avoids redundant storage of the data.
2. By avoiding serialization of high-level S4 objects, it makes the
package easier to maintain in the long run.

Note that the helper function could also implement a cache mechanism
(easy to do with an environment) so the BED file is only loaded and the
object created the 1st time the function is called. On subsequent calls,
the object is retrieved from the cache.

However, if the BED files are really big (e.g. > 50 Mb), we require them
to be stored on ExperimentHub instead of inside dummyData. Note that you
still need to provide the dummyData package (which becomes what we call
an ExperimentHub-based data package). See the "Creating An ExperimentHub
Package" vignette in the ExperimentHubData package for more information
about this.

Hope this helps,

H.

On 1/9/20 04:45, web working wrote:

Dear all,

I am currently developing a software package (dummySoftware) and a data
package (dummyData) and I am a bit confused in where to store my RData
files in the data package. Here my situation:

I want to store some software package objects (new class objects of the
software package) in the data package. This objects are example objects
and a to big for software packages. As I have read here
(

https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o=

) all RData objects should be stored in the data directory of a

package.

BED files of the data package are stored in inst/extdata.
The data of the data packaged will be addressed in the software package
like this: system.file('extdata', 'subset.bed', package = 'dummyData').
And here the problem occurs. After building the data package
(devtools::build(args = c('--resave-data'))), all data in data/ are
stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
addressed with system.file. Addressing this data with the data()
function results in a warning during BiocCheck::BiocCheck().

My solution is to store the RData files in the inst/extdata directory
and address them with system.file. Something similar is mentioned here,
but in the context of a vignette
(r-pkgs.had.co.nz/data.html#other-data). Is this the way 

Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-14 Thread Kasper Daniel Hansen
Tobias,

When you use the data() command on the data package, you need to do
  library(dummyData)
first (and you therefore need to Suggest: dummyData)

Here is an example from minfi/minfiData

if (require(minfiData)) {
  dat <- preprocessIllumina(RGsetEx, bg.correct=FALSE, normalize="controls")
}

Note how I use require to load the package. For clarity you could argue I
should also have
  data(RGsetEx)
but it is technically not necessary because of lazy loading.





On Thu, Jan 9, 2020 at 4:40 PM Pages, Herve  wrote:

> On 1/9/20 13:00, web working wrote:
> > Hi Herve,
> >
> > thank you for your detailed answer. I guess I have expressed myself
> > unclear. The BED files were just examples for data I store in the
> > inst/extdata folder. Based on the description for ExperimentHubData I
> > have decided to create a software and a data package (no
> > ExperimentHubData software package). In my RData files I store software
> > package objects. These objects are bigger than 5 MB. Using a helper
> > function is no option, because the object calculation takes to much
> > time. For this reason I want to load this objects for my example
> > functions. My question is if the storage of my RData files in the
> > inst/extdata directory is correct or not.
>
> It's technically correct but it's not as convenient as putting them in
> data/ because they can not longer be listed and/or loaded with data().
> So if you're storing them in inst/extdata only because the data()
> solution gave you a BiocCheck warning then I'd say that you're giving up
> too easily ;-)
>
> IMO it is important to try to understand why the data() solution gave
> you a BiocCheck warning in the first place. Unfortunately you're not
> providing enough information for us to be able to tell. What does the
> warning say? How can we reproduce the warning? Ideally we would need to
> see a transcript of your session and links to your packages.
>
> Thanks,
> H.
>
>
> >
> > Best,
> >
> > Tobias
> >
> > Am 09.01.20 um 17:59 schrieb Pages, Herve:
> >> Hi Tobias,
> >>
> >> If the original data is in BED files, there should be no need to
> >> serialize the objects obtained by importing the files. It is **much**
> >> better to provide a small helper function that creates an object from a
> >> BED file and to use that function each time you need to load an object.
> >>
> >> This has at least 2 advantages:
> >> 1. It avoids redundant storage of the data.
> >> 2. By avoiding serialization of high-level S4 objects, it makes the
> >> package easier to maintain in the long run.
> >>
> >> Note that the helper function could also implement a cache mechanism
> >> (easy to do with an environment) so the BED file is only loaded and the
> >> object created the 1st time the function is called. On subsequent calls,
> >> the object is retrieved from the cache.
> >>
> >> However, if the BED files are really big (e.g. > 50 Mb), we require them
> >> to be stored on ExperimentHub instead of inside dummyData. Note that you
> >> still need to provide the dummyData package (which becomes what we call
> >> an ExperimentHub-based data package). See the "Creating An ExperimentHub
> >> Package" vignette in the ExperimentHubData package for more information
> >> about this.
> >>
> >> Hope this helps,
> >>
> >> H.
> >>
> >> On 1/9/20 04:45, web working wrote:
> >>> Dear all,
> >>>
> >>> I am currently developing a software package (dummySoftware) and a data
> >>> package (dummyData) and I am a bit confused in where to store my RData
> >>> files in the data package. Here my situation:
> >>>
> >>> I want to store some software package objects (new class objects of the
> >>> software package) in the data package. This objects are example objects
> >>> and a to big for software packages. As I have read here
> >>> (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o=
> >>>
> >>> ) all RData objects should be stored in the data directory of a
> package.
> >>> BED files of the data package are stored in inst/extdata.
> >>> The data of the data packaged will be addressed in the software package
> >>> like this: system.file('extdata', 'subset.bed', package = 'dummyData').
> >>> And here the problem occurs. After building the data package
> >>> (devtools::build(args = c('--resave-data'))), all data in data/ are
> >>> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
> >>> addressed with system.file. Addressing this data with the data()
> >>> function results in a warning during BiocCheck::BiocCheck().
> >>>
> >>> My solution is to store the RData files in the inst/extdata directory
> >>> and address them with system.file. Something similar is mentioned here,
> >>> but in the context of a vignette
> >>> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
> >>>
> >>> Best,
> 

Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-09 Thread Pages, Herve
On 1/9/20 13:00, web working wrote:
> Hi Herve,
> 
> thank you for your detailed answer. I guess I have expressed myself 
> unclear. The BED files were just examples for data I store in the 
> inst/extdata folder. Based on the description for ExperimentHubData I 
> have decided to create a software and a data package (no 
> ExperimentHubData software package). In my RData files I store software 
> package objects. These objects are bigger than 5 MB. Using a helper 
> function is no option, because the object calculation takes to much 
> time. For this reason I want to load this objects for my example 
> functions. My question is if the storage of my RData files in the 
> inst/extdata directory is correct or not.

It's technically correct but it's not as convenient as putting them in 
data/ because they can not longer be listed and/or loaded with data().
So if you're storing them in inst/extdata only because the data() 
solution gave you a BiocCheck warning then I'd say that you're giving up 
too easily ;-)

IMO it is important to try to understand why the data() solution gave 
you a BiocCheck warning in the first place. Unfortunately you're not 
providing enough information for us to be able to tell. What does the 
warning say? How can we reproduce the warning? Ideally we would need to 
see a transcript of your session and links to your packages.

Thanks,
H.


> 
> Best,
> 
> Tobias
> 
> Am 09.01.20 um 17:59 schrieb Pages, Herve:
>> Hi Tobias,
>>
>> If the original data is in BED files, there should be no need to
>> serialize the objects obtained by importing the files. It is **much**
>> better to provide a small helper function that creates an object from a
>> BED file and to use that function each time you need to load an object.
>>
>> This has at least 2 advantages:
>> 1. It avoids redundant storage of the data.
>> 2. By avoiding serialization of high-level S4 objects, it makes the
>> package easier to maintain in the long run.
>>
>> Note that the helper function could also implement a cache mechanism
>> (easy to do with an environment) so the BED file is only loaded and the
>> object created the 1st time the function is called. On subsequent calls,
>> the object is retrieved from the cache.
>>
>> However, if the BED files are really big (e.g. > 50 Mb), we require them
>> to be stored on ExperimentHub instead of inside dummyData. Note that you
>> still need to provide the dummyData package (which becomes what we call
>> an ExperimentHub-based data package). See the "Creating An ExperimentHub
>> Package" vignette in the ExperimentHubData package for more information
>> about this.
>>
>> Hope this helps,
>>
>> H.
>>
>> On 1/9/20 04:45, web working wrote:
>>> Dear all,
>>>
>>> I am currently developing a software package (dummySoftware) and a data
>>> package (dummyData) and I am a bit confused in where to store my RData
>>> files in the data package. Here my situation:
>>>
>>> I want to store some software package objects (new class objects of the
>>> software package) in the data package. This objects are example objects
>>> and a to big for software packages. As I have read here
>>> (https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o=
>>>  
>>>
>>> ) all RData objects should be stored in the data directory of a package.
>>> BED files of the data package are stored in inst/extdata.
>>> The data of the data packaged will be addressed in the software package
>>> like this: system.file('extdata', 'subset.bed', package = 'dummyData').
>>> And here the problem occurs. After building the data package
>>> (devtools::build(args = c('--resave-data'))), all data in data/ are
>>> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
>>> addressed with system.file. Addressing this data with the data()
>>> function results in a warning during BiocCheck::BiocCheck().
>>>
>>> My solution is to store the RData files in the inst/extdata directory
>>> and address them with system.file. Something similar is mentioned here,
>>> but in the context of a vignette
>>> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
>>>
>>> Best,
>>> Tobias
>>>
>>> ___
>>> Bioc-devel@r-project.org mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8=
>>>  
>>>
>>>
> 
> ___
> Bioc-devel@r-project.org mailing list
> 

Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-09 Thread web working
Hi Richard,

It depends on the filetype. I am loading my "non RData" files with 
read.delim and my RData files with a helper function which returns a R 
object of the RData object:

#' Load RData object and returns first entry
#'
#' Load RData object and returns first entry. If there is more than one
#' object in the RData file, only the first object will be returned.
#'
#' @param RDataFile a \code{character} vector to the RData file.
#'
#' @return The \code{R} object of the RData file.
#' @export
#' @examples
#' # load GRanges object stored in a RData file.
#' dummy.GRanges <- loadRData(system.file('extdata', 'dummy.RData', 
package = "dummyData"))
loadRData <- function(RDataFile) {
   load(RDataFile)
   objectToLoad <- ls()[ls() != "RDataFile"]
   if (length(objectToLoad) > 1)
     warning(paste0("RData file contains more than one object. Only the 
first object (",
    objectToLoad[1], ") will be returned!"))
   get(objectToLoad[1])
}

I know this is not the best solution. I guess saving the R objects in 
RDS files instead of RData files is the better solution here.

My question is if my storage of the RData objects (or RDS objects) in 
the inst/extdata directory is ok for an Bioconductor package.

Best,

Tobias


Am 09.01.20 um 15:45 schrieb Richard Virgen-Slane:
>
> I may be missing a point, but how are you loading the saved files?
>
> On Thu, Jan 9, 2020 at 4:46 AM web working  > wrote:
>
> Dear all,
>
> I am currently developing a software package (dummySoftware) and a
> data package (dummyData) and I am a bit confused in where to store
> my RData files in the data package. Here my situation:
>
> I want to store some software package objects (new class objects
> of the software package) in the data package. This objects are
> example objects and a to big for software packages. As I have read
> here (http://r-pkgs.had.co.nz/data.html) all RData objects should
> be stored in the data directory of a package. BED files of the
> data package are stored in inst/extdata.
> The data of the data packaged will be addressed in the software
> package like this: system.file('extdata', 'subset.bed', package =
> 'dummyData'). And here the problem occurs. After building the data
> package (devtools::build(args = c('--resave-data'))), all data in
> data/ are stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx
> and can not addressed with system.file. Addressing this data with
> the data() function results in a warning during
> BiocCheck::BiocCheck().
>
> My solution is to store the RData files in the inst/extdata
> directory and address them with system.file. Something similar is
> mentioned here, but in the context of a vignette
> (r-pkgs.had.co.nz/data.html#other-data
> ). Is this the way
> how to do it?
>
> Best,
> Tobias
>
> ___
> Bioc-devel@r-project.org  mailing
> list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-09 Thread web working

Hi Herve,

thank you for your detailed answer. I guess I have expressed myself 
unclear. The BED files were just examples for data I store in the 
inst/extdata folder. Based on the description for ExperimentHubData I 
have decided to create a software and a data package (no 
ExperimentHubData software package). In my RData files I store software 
package objects. These objects are bigger than 5 MB. Using a helper 
function is no option, because the object calculation takes to much 
time. For this reason I want to load this objects for my example 
functions. My question is if the storage of my RData files in the 
inst/extdata directory is correct or not.


Best,

Tobias

Am 09.01.20 um 17:59 schrieb Pages, Herve:

Hi Tobias,

If the original data is in BED files, there should be no need to
serialize the objects obtained by importing the files. It is **much**
better to provide a small helper function that creates an object from a
BED file and to use that function each time you need to load an object.

This has at least 2 advantages:
1. It avoids redundant storage of the data.
2. By avoiding serialization of high-level S4 objects, it makes the
package easier to maintain in the long run.

Note that the helper function could also implement a cache mechanism
(easy to do with an environment) so the BED file is only loaded and the
object created the 1st time the function is called. On subsequent calls,
the object is retrieved from the cache.

However, if the BED files are really big (e.g. > 50 Mb), we require them
to be stored on ExperimentHub instead of inside dummyData. Note that you
still need to provide the dummyData package (which becomes what we call
an ExperimentHub-based data package). See the "Creating An ExperimentHub
Package" vignette in the ExperimentHubData package for more information
about this.

Hope this helps,

H.

On 1/9/20 04:45, web working wrote:

Dear all,

I am currently developing a software package (dummySoftware) and a data
package (dummyData) and I am a bit confused in where to store my RData
files in the data package. Here my situation:

I want to store some software package objects (new class objects of the
software package) in the data package. This objects are example objects
and a to big for software packages. As I have read here
(https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o=
) all RData objects should be stored in the data directory of a package.
BED files of the data package are stored in inst/extdata.
The data of the data packaged will be addressed in the software package
like this: system.file('extdata', 'subset.bed', package = 'dummyData').
And here the problem occurs. After building the data package
(devtools::build(args = c('--resave-data'))), all data in data/ are
stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
addressed with system.file. Addressing this data with the data()
function results in a warning during BiocCheck::BiocCheck().

My solution is to store the RData files in the inst/extdata directory
and address them with system.file. Something similar is mentioned here,
but in the context of a vignette
(r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?

Best,
Tobias

___
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8=



___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] How to use RData files in Bioconductor data and software packages

2020-01-09 Thread Pages, Herve
Hi Tobias,

If the original data is in BED files, there should be no need to 
serialize the objects obtained by importing the files. It is **much** 
better to provide a small helper function that creates an object from a 
BED file and to use that function each time you need to load an object.

This has at least 2 advantages:
1. It avoids redundant storage of the data.
2. By avoiding serialization of high-level S4 objects, it makes the 
package easier to maintain in the long run.

Note that the helper function could also implement a cache mechanism 
(easy to do with an environment) so the BED file is only loaded and the 
object created the 1st time the function is called. On subsequent calls, 
the object is retrieved from the cache.

However, if the BED files are really big (e.g. > 50 Mb), we require them 
to be stored on ExperimentHub instead of inside dummyData. Note that you 
still need to provide the dummyData package (which becomes what we call 
an ExperimentHub-based data package). See the "Creating An ExperimentHub 
Package" vignette in the ExperimentHubData package for more information 
about this.

Hope this helps,

H.

On 1/9/20 04:45, web working wrote:
> Dear all,
> 
> I am currently developing a software package (dummySoftware) and a data 
> package (dummyData) and I am a bit confused in where to store my RData 
> files in the data package. Here my situation:
> 
> I want to store some software package objects (new class objects of the 
> software package) in the data package. This objects are example objects 
> and a to big for software packages. As I have read here 
> (https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o=
>  
> ) all RData objects should be stored in the data directory of a package. 
> BED files of the data package are stored in inst/extdata.
> The data of the data packaged will be addressed in the software package 
> like this: system.file('extdata', 'subset.bed', package = 'dummyData'). 
> And here the problem occurs. After building the data package 
> (devtools::build(args = c('--resave-data'))), all data in data/ are 
> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not 
> addressed with system.file. Addressing this data with the data() 
> function results in a warning during BiocCheck::BiocCheck().
> 
> My solution is to store the RData files in the inst/extdata directory 
> and address them with system.file. Something similar is mentioned here, 
> but in the context of a vignette 
> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
> 
> Best,
> Tobias
> 
> ___
> Bioc-devel@r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel=DwICAg=eRAMFD45gAfqt84VtBcfhQ=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8=
>  
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel