This is a great comment if the primary use of the data is to make the data
available.

It is clear that a change in the internals of the class structure requires
changing the data package, and that is a clear drawback to my
recommendation.  I have had to do this on several occasions.

One issue with Herve's recommendation is when the same data structure is
used in several examples.  In that case, the conversion / parsing overhead
multiplies by the number of examples.  As an example, in minfiData I have
data on 6 samples on a somewhat large array.  Parsing the raw data files
for 3 of the 6 files takes 16 secs (you get this timing, because this is
what I have in example(read.450k.exp)).  Loading all 6 arrays as an R data
structure takes 1.1 sec.

I would generally recommend that a data package either includes a more raw
form of the data or has a script which makes the data easily retrievable.

Best,
Kasper


On Tue, Jan 28, 2014 at 8:01 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:

> Hi Daniel,
>
>
> On 01/28/2014 03:49 PM, Daniel Kelley wrote:
>
>> I have an issue with a circular package dependence that prevents
>> building/checking, and I seek advice on breaking the circle so the packages
>> can pass the build-check tests that are required for CRAN submission.
>>
>> The package pair I'm working with is slow to build, but my tests suggest
>> the issue may be general, and so I will explain it in general terms.
>>
>> Suppose there are two packages:
>>
>> 1. Foo, a package that defines some data types with S4 classes.
>>
>> 2. Foodata, a package that provides such datasets, for use by Foo.
>>
>> With this setup, it seems reasonable that Foo "depends" on Foodata, so
>> the data can be used in Foo and its documentation.
>>
>> Since the data within Foodata are S4 classes as defined in Foo, an
>> attempt to build-check Foodata will produce an error unless Foo is present.
>>  But Foo cannot be built unless Foodata exists, since it depends on it.
>>  Thus neither Foo nor Foodata can be built and checked.
>>
>
> I've learned by experience that it's generally better (although not
> always possible) to avoid putting serialized S4 objects in a data
> package. They will break if you need to modify a little bit the
> internals of the class (and chances are high that you will at some
> point). Better to store the data in a format that is more or less
> guaranteed to remain the same for years (SQLite, XML, hdf5, plain text,
> serialized data frame, SAM/BAM, etc...) and try to come up with
> a fast way to load and turn the data into an S4 object on demand.
>
> Not always possible if the data is huge... but for the purpose of using
> it in Foo's examples and vignette do you really need huge data?
>
> Another advantage of this approach is that the data can then be
> more easily shared because it can be accessed with tools other
> than yours, e.g. tools that don't know about S4 and even non-R
> tools.
>
> Cheers,
> H.
>
>
>> One solution would be to wrap the Foo documentation examples (and
>> relevant Foo code) in require() blocks, and to make Foo "suggest" Foodata,
>> not "depend" upon it.  My question is whether this is the recommended
>> practice, or the common practice.
>>
>> Thanks in advance to anyone who wishes to offer hints.
>>
>> PS. The problem arose from an attempt to reduce CRAN load by extracting
>> the datasets that had been contained within a previous version of Foo.
>>
>> PPS. my (slow-building) packages are on github and I can supply details
>> if needed.
>>
>> Dan E. Kelley
>> Professor, Oceanography Department
>> Dalhousie University, Canada
>> dan.kel...@dal.ca<mailto:dan.kel...@dal.ca>
>>
>>
>>
>>         [[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to