Hi,
I'm responding to the question about storage error, trying to read a 3000000 x
100 dataset into a data.frame.
I wonder whether you can read the data as strings. If the numbers are all one
digit, each cell would require just 1 byte instead of 8. That makes 300MB
instead of 2.4GB. You can run crosstabs on the character values just as easily
as if they were numeric. If you need numeric values, convert them a few at a
time using as.numeric(). Here's an example --
# Generate some data and write it to a text file
v <- rnorm(5,0,0.7); C_xx <- diag(v^2)+v%o%v
C_xx
mu <- rep(5,5)
X.dat <- data.frame(round(mvrnorm(250, mu, C_xx)))
head(X.dat)
write.table(X.dat, "X.dat")
# Read the data using scan, convert it to a data.frame
Xstr.dat <- matrix(scan("X.dat", what="character", skip=1), 250, byrow=TRUE)
Xstr.dat <- as.data.frame(Xstr.dat[,2:6], stringsAsFactors=FALSE)
head(Xstr.dat)
# Run a crosstab
attach(Xstr.dat)
table(V1, V2)
Probably you do not need the option "stringsAsFactors=FALSE". Without it, the
strings are converted to factors. Probably that does not change the amount of
storage required.
Larry Hotchkiss
------------------------------------------------------------------------------------
Message: 6
Date: Tue, 10 Nov 2009 04:10:07 -0800 (PST)
From: maiya <[email protected]>
Subject: [R] Error: cannot allocate vector of size...
To: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=us-ascii
I'm trying to import a table into R the file is about 700MB. Here's my first
try:
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
In addition: Warning messages:
1: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
2: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
3: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
4: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
Reached total allocation of 1535Mb: see help(memory.size)
Then I tried
> memory.limit(size=4095)
and got
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 11.3 Mb
but no additional errors. Then optimistically to clear up the workspace:
> rm()
> DD<-read.table("01uklicsam-20070301.dat",header=TRUE)
Error: cannot allocate vector of size 15.6 Mb
Can anyone help? I'm confused by the values even: 15.6Mb, 1535Mb, 11.3Mb?
I'm working on WinXP with 2 GB of RAM. Help says the maximum obtainable
memory is usually 2Gb. Surely they mean GB?
The file I'm importing has about 3 million cases with 100 variables that I
want to crosstabulate each with each. Is this completely unrealistic?
Thanks!
Maja
--
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.