Hi all,
I have a large file (1.8 GB) with 900,000 lines that I would like to read.
Each line is a string characters. Specifically I would like to randomly
select 3000 lines. For smaller files, what I'm doing is:

trs <- scan("myfile", what= character(), sep = "\n")
trs<- trs[sample(length(trs), 3000)]

And this works OK; however my computer seems not able to handle the 1.8 G
file.
I thought of an alternative way that not require to read the whole file:

sel <- sample(1:900000, 3000)
for (i in 1:3000)  {
un <- scan("myfile", what= character(), sep = "\n", skip=sel[i], nlines=1)
write(un, "myfile_short", append=TRUE)
}

This works on my computer; however it is extremely slow; it read one line
each time. It is been running for 25 hours and I think it has done less than
half of the file (Yes, probably I do not have a very good computer and I'm
working under Windows ...).
So my question is: do you know any other faster way to do this?
Thanks in advance

Juli

-- 
http://www.ceam.es/pausas

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to