On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker <bbol...@gmail.com> wrote: > Scott <ncbi2r <at> googlemail.com> writes: > >> >> It does look like you've got a memory issue. perhaps using >> as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments >> to read.table >> >> if you don't specify these sorts of things, R can have to look through the >> file and figure out which columns are characters/factors etc and so the >> larger files cause more of a headache for R I'm guess. Hopefully someone >> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and >> stringsAsFactors. >> >> do you have other objects loaded in memory as well? this file by itself >> might not be the problem - but it's a cumulative issue. >> have you checked the file structure in any other manner? >> how large (Mb/kb) is the file that you're trying to read? >> if you just read in parts of the file, is it okay? >> read.table(filename,header=FALSE,sep="\t",nrows=100) >> read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100) > > There seem to be two issues here: > > 1. what can the original poster (OP) do to work around this problem? > (e.g. get the data into a relational data base and import it from > there; use something from the High Performance task view such as > ff or data.table ...)
Interestingly, the text file was created by a selection from an SQL data base. I have access to 'db2' on an ubuntu machine, I run, at the bash prompt, $ db2 < file2.sql where file2.sql contains connect to linnedb user goran using xxxxxxxxxxx export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09 select linneid, fodelsear, kon, ....... from u09021.fil2 connect reset How do I get a direct connection between R and the data base 'linnedb'? > 2. reporting a bug -- according to the R FAQ, any low-level > (segmentation-fault-type) crash of R when one is not messing > around with dynamically loaded code constitutes a bug. Unfortunately, > debugging problems like this is a huge pain in the butt. > > Goran, can you randomly or systematically generate an > object of this size, write it to disk, read it back in, and > generate the same error? In other words, does something like > > set.seed(1001) > d <- data.frame(label=rep(LETTERS[1:11],1e6), > values=matrix(rep(1.0,11*17*1e6),ncol=17) > write.table(d,file="big.txt") > read.table("big.txt") > > do the same thing? No but I get new errors: > ss <- read.table("big.txt") Error in read.table("big.txt") : duplicate 'row.names' are not allowed (there are no duplicates) I tried to add an item to the first line and > ss <- read.table("big.txt", header = TRUE) Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 10610008 did not have 19 elements which is wrong; that line has 19 elements. Göran > Reducing it to this kind of reproducible example will make > it possible for others to debug it without needing to gain > access to your huge file ... > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Göran Broström ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel