On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
Dear all,
I have a repository file (let's call it repo.txt)
that contain two columns like this:
# tag value
AAA0.2
AAT0.3
AAC 0.02
AAG 0.02
ATA0.3
ATT 0.7
Given another query vector
qr - c(AAC,
you might try to iteratively read a limited number of line of lines in a
batch using readLines:
# filename, the name of your file
# n, the maximal count of lines to read in a batch
connection = file(filename, open=rt)
while (length(lines - readLines(con=connection, n=n))) {
# do your stuff
The sqldf package can read a large file to a database without going
through R followed by extracting it. The package makes it easier
to use RSQLite by setting up the database for you and after extracting
the portion you want removing the database automatically. You can
specify all this in two
Something like this should work
library(R.utils)
out = numeric()
qr = c(AAC, ATT)
n =countLines(test.txt)
file = file(test.txt, r)
for (i in 1:n){
line = readLines(file, n = 1)
A = strsplit (line, split = )[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split = )[[1]][2])
if the file is really large, reading it twice may add considerable penalty:
r...@quantide.com wrote:
Something like this should work
library(R.utils)
out = numeric()
qr = c(AAC, ATT)
n =countLines(test.txt)
# 1st pass
file = file(test.txt, r)
for (i in 1:n){
# 2nd pass
line =
I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could
be a very time consuming task if someone is not familiar with database
technology.
Using file() is not a real reading of all the
r...@quantide.com wrote:
Using file() is not a real reading of all the file. This function will
simply open a connection to the file without reading it.
countLines should do something lile wc -l from a bash shell
just for a test:
cat(rep('', 10^7), file='test.txt', fill=1)
library(R.utils)
r...@quantide.com wrote:
I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could
be a very time consuming task if someone is not familiar with database
technology.
and won't pay if you
On Fri, Jan 16, 2009 at 5:52 AM, r...@quantide.com r...@quantide.com wrote:
I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could be a
very time consuming task if someone is not
Hi Gabor,
Do you mean storing data in sqldf', doesn't take memory?
For example, I have 3GB data file. with standard R object using read.table()
the object size will explode twice ~6GB. My current 4GB RAM
cannot handle that.
Do you mean with sqldf, this is not the issue?
Why is that?
Sorry for
Only the portion your extract is ever in R -- the file itself is read
into a database
without ever going through R so your memory requirements correspond to what
you extract, not the size of the file.
On Fri, Jan 16, 2009 at 10:49 AM, Gundala Viswanath gunda...@gmail.com wrote:
Hi Gabor,
Do
Hi,
Unless you specify an in-memory database the database is stored on disk.
Thanks for your explanation.
I just downloaded 'sqldf'.
Where can I find the option for that? In sqldf I can't see the command.
I looked at:
envir = parent.frame()
doesn't appear to be the one.
- Gundala Viswanath
If that refers to using a database on disk to temporarily hold
the file then example 6 on the home page shows it, as mentioned,
and you may wish to look at the other examples there too and
there is further documentation in the ?sqldf help file.
On Fri, Jan 16, 2009 at 11:11 AM, Gundala Viswanath
13 matches
Mail list logo