Hi baptiste,
thanks a lot. Could you please comment on that code, I cannto figure out what it does. Appart from the file name, what parameters does it need? Seems to me like you need to know the size of the table a priori. Is that right? Do you have to set up the block size depending on that (so that you get full multiples of the block to form the resulting frame)?
Cheers
Martin

On 8/12/2010 2:45 PM, baptiste AuguiƩ wrote:
Hi,

I don't know if this can be useful to you, but I recently wrote a small 
function to read a large datafile like yours in a number of steps, with the 
possibility to save each intermediate block as .Rdata. This is based on 
read.table --- not as efficient as lower-level scan() but it might be good 
enough,

file<- 'test.txt'
## write.table(matrix(rnorm(1e6*14), ncol=14), file=file,row.names = F,
##             col.names = F )

n<- as.numeric(gsub("[^0123456789]","", system(paste("wc -l ", file), 
int=TRUE)))
n

blocks<- function(n=18, size=5){
res<- c(replicate(n%/%size, size))
if(n%%size) res<- c(res, n%%size)
if(!sum(res) == n) stop("ERROR!!!")
res
}
## blocks(1003, 500)


readBlocks<- function(file, nbk=1e5, out="tmp", save.inter=TRUE,
                        classes= c("numeric", "numeric", rep("NULL", 6),
                          "numeric", "numeric", rep("NULL", 4))){

   n<- as.numeric(gsub("[^0123456789]","", system(paste("wc -l ", file), 
int=TRUE)))

   ncols<- length(grep("NULL", classes, invert=TRUE))
   results<- matrix(0, nrow=n, ncol=ncols)
   Nb<- blocks(n, nbk)
   skip<- c(0, cumsum(Nb))
   for(ii in seq_along(Nb)){
     d<- read.table(file, colClasses = classes, nrows=Nb[ii], skip=skip[ii], comment.char 
= "")
     if(save.inter){
       save(d, file=paste(out, ".", ii, ".rda", sep=""))
       }
     print(ii)
     results[seq(1+skip[ii], skip[ii]+Nb[ii]), ]<- as.matrix(d)
     rm(d) ; gc()
   }
   save(results, file=paste(out, ".rda", sep=""))
   invisible(results)
}

## test<- readBlocks(file)

HTH,

baptiste



On Aug 12, 2010, at 1:34 PM, Martin Tomko wrote:

Hi Peter,
thank you for your reply. I still cannot get it to work.
I have modified your code as follows:
rows<-length(R)
cols<- max(unlist(lapply(R,function(x) length(unlist(gregexpr(" 
",x,fixed=TRUE,useBytes=TRUE))))))
c<-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)
m<-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);

the list c seems ok, with all the values I would expect. Still, length(c) gives 
me a value = cols+1, which I find odd (I would expect =cols).
I thine repeated it rows-1 times (to account for the header row). The values 
seem ok.
Anyway, I tried to construct the matrix, but when I print it, the values are 
odd:
m[1:10,1:10]
      [,1] [,2]       [,3]       [,4]       [,5]       [,6]       [,7]
[1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
....

Any idea where the values are gone?
Thanks
Martin

Hence, I filled it into the matrix of dimensions

On 8/12/2010 12:24 PM, peter dalgaard wrote:
On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:


c<-scan(file=f,what=list(c("",(rep(integer(0),cols)))), skip=1)
m<-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);

for some reason I end up with a character matrix, which I don't want. Is this the proper 
way to skip the first column (this is not documented anywhere - how does one skip the 
first column in scan???). is my way of specifying "integer(0)" correct?

No. Well, integer(0) is just superfluous where 0L would do, since scan only 
looks at the types not the contents, but more importantly, what= wants a list 
of as many elements as there are columns and you gave it


list(c("",(rep(integer(0),5))))

[[1]]
[1] ""

I think what you actually meant was

c(list(NULL),rep(list(0L),5))




And finally - would any sparse matrix package be more appropriate, and can I 
use a sparse matrix for the image() function producing typical heat,aps? I have 
seen that some sparse matrix packages produce different looking outputs, which 
would not be appropriate.

Thanks
Martin

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email:  martin.to...@geo.uzh.ch
site:   http://www.geo.uzh.ch/~mtomko
mob:    +41-788 629 558
tel:    +41-44-6355256
fax:    +41-44-6356848

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email:  martin.to...@geo.uzh.ch
site:   http://www.geo.uzh.ch/~mtomko
mob:    +41-788 629 558
tel:    +41-44-6355256
fax:    +41-44-6356848

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to