Hi there,
Problem ::
When one tries to change one or some of the columns of a data.frame, R makes
a copy of the whole data.frame using the '*tmp*' mechanism (this does not
happen for components of a list, tracemem( ) on R-2.6.2 says so).
Suggested solution ::
Store the columns of the data.frame as a list inside of an environment slot
of an S4 class, and define the '[', '[<-' etc. operators using setMethod( )
and setReplaceMethod( ).
Question ::
This implementation will violate copy on modify principle of R (since
environments are not copied), but will save a lot of memory. Do you see any
other obvious problem(s) with the idea? Have you seen a related setup
implemented / considered before (apart from the packages like filehash, ff,
and database related ones for saving memory)?
Implementation code snippet ::
### The S4 class.
setClass('DataFrame',
representation(data = 'data.frame', nrow = 'numeric', ncol =
'numeric', store = 'environment'),
prototype(data = data.frame( ), nrow = 0, ncol = 0))
setMethod('initialize', 'DataFrame', function(.Object) {
.Object <- callNextMethod( )
[EMAIL PROTECTED] <- new.env(hash = TRUE)
assign('data', as.list([EMAIL PROTECTED]), [EMAIL PROTECTED])
[EMAIL PROTECTED] <- nrow([EMAIL PROTECTED])
[EMAIL PROTECTED] <- ncol([EMAIL PROTECTED])
[EMAIL PROTECTED] <- data.frame( )
.Object
})
### Usage:
nn <- 10
## dd1 below could possibly be created by read.table or scan and data.frame
dd1 <- data.frame(xx = rnorm(nn), yy = rnorm(nn))
dd2 <- new('DataFrame', data = dd1)
rm(dd1)
## Now work with dd2
Thanks a lot,
Gopi Goswami.
PhD, Statistics, 2005
http://gopi-goswami.net/index.html
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel