R is so smart! I found that when you switch a column from integer to factor, the memory consumption goes down rather impressively.
Now I'd like to learn more. How does R do this? What does R do? How do I learn more? I got to thinking: If I was really smart, I'd see that a factor with 2 levels requires only 1 bit of storage. So I'd be able to cram 8 such factors into a byte. But this would come at the price of complexity of code since reading and writing that object would require sub-byte operations. Does R go this far? I think not, given the more modest gains that I see. Does he go down till a byte? A four-byte word instead of 8-bytes of storage? What are Ncells and Vcells, and what determines his consumption of memory for each kind? If you're curious about this, here's a program that serves as a demo: x <- matrix(as.numeric(runif(1e6)>.5), nrow=100000) D <- data.frame(x) rm(x) # Take stock: gc() sum(gc()[,2]) object.size(D) # Switch to factors -- D$X1 <- factor(D$X1); D$X2 <- factor(D$X2); D$X3 <- factor(D$X3) D$X4 <- factor(D$X4); D$X5 <- factor(D$X5); D$X6 <- factor(D$X6) D$X7 <- factor(D$X7); D$X8 <- factor(D$X8); D$X9 <- factor(D$X9) D$X10 <- factor(D$X10) # Take stock: gc() sum(gc()[,2]) object.size(D) Using this, I find that the cost of these 10 vectors goes down from 12 Meg to 8 Meg. This suggests savings, but not the dramatic impact of recognising that a factor with 2 levels only requires 1 bit. -- Ajay Shah Consultant [EMAIL PROTECTED] Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
