Hi there,

Problem ::
When one tries to change one or some of the columns of a data.frame, R makes
a copy of the whole data.frame using the '*tmp*' mechanism (this does not
happen for components of a list, tracemem( ) on R-2.6.2 says so).


Suggested solution ::
Store the columns of the data.frame as a list inside of an environment slot
of an S4 class, and define the '[', '[<-' etc. operators using setMethod( )
and setReplaceMethod( ).


Question ::
This implementation will violate copy on modify principle of R (since
environments are not copied), but will save a lot of memory. Do you see any
other obvious problem(s) with the idea? Have you seen a related setup
implemented / considered before (apart from the packages like filehash, ff,
and database related ones for saving memory)?


Implementation code snippet ::
### The S4 class.
setClass('DataFrame',
              representation(data = 'data.frame', nrow = 'numeric', ncol =
'numeric', store = 'environment'),
              prototype(data = data.frame( ), nrow = 0, ncol = 0))

setMethod('initialize', 'DataFrame', function(.Object) {
    .Object <- callNextMethod( )
    [EMAIL PROTECTED] <- new.env(hash = TRUE)
    assign('data', as.list([EMAIL PROTECTED]), [EMAIL PROTECTED])
    [EMAIL PROTECTED] <- nrow([EMAIL PROTECTED])
    [EMAIL PROTECTED] <- ncol([EMAIL PROTECTED])
    [EMAIL PROTECTED] <- data.frame( )
    .Object
})


### Usage:
nn  <- 10
## dd1 below could possibly be created by read.table or scan and data.frame
dd1 <- data.frame(xx = rnorm(nn), yy = rnorm(nn))
dd2 <- new('DataFrame', data = dd1)
rm(dd1)
## Now work with dd2


Thanks a lot,
Gopi Goswami.
PhD, Statistics, 2005
http://gopi-goswami.net/index.html

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to