Chris,

On Jan 7, 2013, at 6:23 PM, Chris Jewell wrote:

> Hi All,
> 
> I'm currently trying to write an S4 class that mimics a data.frame, but 
> stores data on disc in HDF5 format.  The idea is that the dataset is likely 
> to be too large to fit into a standard desktop machine, and by using 
> subscripts, the user may load bits of the dataset at a time.  eg:
> 
>> myLargeData <- LargeData("/path/to/file")
>> mySubSet <- myLargeData[1:10, seq(1,15,by=3)]
> 
> I've therefore defined by LargeData class thus
> 
>> LargeData <- setClass("LargeData", representation(filename="character"))
>> setMethod("initialize","LargeData", function(.Object,filename) 
>> .Object@filename <- filename)
> 
> I've then defined the "[" method to call a C++ function (Rcpp), opening the 
> HDF5 file, and returning the required rows/cols as a data.frame.
> 
> However, what if the user wants to load the entire dataset into memory?  
> Which method do I overload to achieve the following?
> 
>> fullData <- myLargeData
>> class(fullData)
> [1] "data.frame"
> 

That makes no sense since a <- b is not a transformation, "a" will have the 
same value as "b" by definition - and thus the same class. If you really meant

fullData <- as.data.frame(myLargerData)

then you just need to implement the as.data.frame() method for your class.

Note, however, that a more common way to convert between a big data reference 
and native format in its entirety is simply myLargeData[] -- you may want to 
have a look at the (many) existing big data packages (AFAIR bigmemory uses C++ 
back-end as well). Also note that indexing is tricky in R and easy to get wrong 
(remember: negative indices, index by name etc.)


> or apply transformations:
> 
>> myEigen <- eigen(myLargeData)
> 
> In C++ I would normally overload the "double" or "float" operator to achieve 
> this -- can I do the same thing in R?
> 

Again, there is no implicit coercion in R (you cannot declare variable type in 
advance) so it doesn't make sense in the context you have in mind from C++ -- 
in R the equivalent is simply implementing as.double() method, but I suspect 
that's not what you had in mind. For generics you can simply implement a method 
for your class (that does the coercion, for example, or uses a more efficient 
way). If you cannot define a generic or don't want to write your own methods 
then it's a problem, because the only theoretical way is to subclass numeric 
vector class, but that is not possible in R if you want to change the 
representation because it falls through to the more efficient internal code too 
quickly (without extra dispatch) for you.

Cheers.
Simon


> Thanks,
> 
> Chris
> 
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to