This is a great comment if the primary use of the data is to make the data available.
It is clear that a change in the internals of the class structure requires changing the data package, and that is a clear drawback to my recommendation. I have had to do this on several occasions. One issue with Herve's recommendation is when the same data structure is used in several examples. In that case, the conversion / parsing overhead multiplies by the number of examples. As an example, in minfiData I have data on 6 samples on a somewhat large array. Parsing the raw data files for 3 of the 6 files takes 16 secs (you get this timing, because this is what I have in example(read.450k.exp)). Loading all 6 arrays as an R data structure takes 1.1 sec. I would generally recommend that a data package either includes a more raw form of the data or has a script which makes the data easily retrievable. Best, Kasper On Tue, Jan 28, 2014 at 8:01 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > Hi Daniel, > > > On 01/28/2014 03:49 PM, Daniel Kelley wrote: > >> I have an issue with a circular package dependence that prevents >> building/checking, and I seek advice on breaking the circle so the packages >> can pass the build-check tests that are required for CRAN submission. >> >> The package pair I'm working with is slow to build, but my tests suggest >> the issue may be general, and so I will explain it in general terms. >> >> Suppose there are two packages: >> >> 1. Foo, a package that defines some data types with S4 classes. >> >> 2. Foodata, a package that provides such datasets, for use by Foo. >> >> With this setup, it seems reasonable that Foo "depends" on Foodata, so >> the data can be used in Foo and its documentation. >> >> Since the data within Foodata are S4 classes as defined in Foo, an >> attempt to build-check Foodata will produce an error unless Foo is present. >> But Foo cannot be built unless Foodata exists, since it depends on it. >> Thus neither Foo nor Foodata can be built and checked. >> > > I've learned by experience that it's generally better (although not > always possible) to avoid putting serialized S4 objects in a data > package. They will break if you need to modify a little bit the > internals of the class (and chances are high that you will at some > point). Better to store the data in a format that is more or less > guaranteed to remain the same for years (SQLite, XML, hdf5, plain text, > serialized data frame, SAM/BAM, etc...) and try to come up with > a fast way to load and turn the data into an S4 object on demand. > > Not always possible if the data is huge... but for the purpose of using > it in Foo's examples and vignette do you really need huge data? > > Another advantage of this approach is that the data can then be > more easily shared because it can be accessed with tools other > than yours, e.g. tools that don't know about S4 and even non-R > tools. > > Cheers, > H. > > >> One solution would be to wrap the Foo documentation examples (and >> relevant Foo code) in require() blocks, and to make Foo "suggest" Foodata, >> not "depend" upon it. My question is whether this is the recommended >> practice, or the common practice. >> >> Thanks in advance to anyone who wishes to offer hints. >> >> PS. The problem arose from an attempt to reduce CRAN load by extracting >> the datasets that had been contained within a previous version of Foo. >> >> PPS. my (slow-building) packages are on github and I can supply details >> if needed. >> >> Dan E. Kelley >> Professor, Oceanography Department >> Dalhousie University, Canada >> dan.kel...@dal.ca<mailto:dan.kel...@dal.ca> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]]
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel