Thomas Lumley wrote:
On Mon, 23 Aug 2004, Tony Plate wrote:

One idea I was thinking about was to have a new class of object that
referred to data in a file on disk, and which had all the standard methods
of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc.

This is what RPgSql does with proxy dataframes and what I did (read-only) for netCDF access. It's a good idea if you have a data format for which random access is fairly fast. I'm not sure that the standard serialized binary format satisfies this. Fixed-format text files would work, but free-format ones wouldn't -- seek() only helps when you can work out where to seek without reading all the data.

Just to join in on the 'done it' threads here, this is what my Rmap package does with DBF files (they are the database component of ESRI Shapefile maps). I use the dbf library from shapelib to access a DBF file just like a data frame.


My dbf objects keep track of selected rows and columns, from the database file, so its possible to do:

 db1 = db[1:10,]

and db1 is still a proxy object to the same DBF file as db, but with attributes that tell it that it only has rows 1 to 10 in it. If you really want a data frame, you just as.data.frame() it.

If you wanted to do this sort of thing for space-saving reasons you'd have to be very careful, since for some operations R might slurp it all into memory.

Baz

http://www.maths.lancs.ac.uk/Software/Rmap/

______________________________________________
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to