RE: Compression

Hi R-Users,

You can deal with pretty decent size data sets in R on a relatively new 
computer.  I have one that I have been working with that is a nearly 100MB 
plain text file.  With storage as inexpensive as it is these days that's not 
really all that much data and I could store it just as it is.

Having said that you may want to compress those data files.  There are two 
reasons for this.  One is that while storage is cheap, large files can be 
harder to move around.  Emailing is an example of something that's difficult to 
do with a larger data file.  The other is that, even though your drive has lots 
of capacity and seems to go pretty fast, it is in fact much much slower than 
your CPU.  Recognizing this, Apple built transparent background compression 
into the file system in it's latest operating system.  It speeds things up 
because the computer spends less time accessing the disk (and more time using 
the CPU decompressing and compressing it, but the CPU usually wasn't doing 
anything anyway, and remember, it's much faster).

It turns out R added transparent decompression for certain kinds of compressed 
files in the latest version (2.10).  If you have your files compressed with 
bzip2, xvz, or gzip they can be read into R as if they are plain text files.  
You should have the proper filename extensions.

The command...

myData <- read.table('myFile.gz')  #gzip compressed files have a "gz" extension

Will work just as if 'myFile.gz' were the raw text file.

This is very handy in distributing your analysis scripts and data to co-workers 
at the same time and saves you space on your hard drive.  More importantly, it 
saves you space on flash drives and backups.

John

PS:  My compressed 100MB data file is 500KB.  That's tiny these days.  Programs 
to do the kinds of compression acceptable for R are available for free.  If you 
have a Mac or Linux computer they are built into your command line and are as 
simple as typing gzip 'filename'.

bonus tip:  R has built in facilities for writing out the compressed files as 
well.  "?connections" gets you the help page with basic info.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to