[R] Value Lookup from File without Slurping

2009-01-16 Thread Gundala Viswanath
Dear all,

I have a repository file (let's call it repo.txt)
 that contain two columns like this:

# tag  value
AAA0.2
AAT0.3
AAC   0.02
AAG   0.02
ATA0.3
ATT   0.7

Given another query vector

 qr - c(AAC, ATT)

I would like to find the corresponding value for each query above,
yielding:

0.02
0.7

However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
Is there any ways to do that?

The reason I want to do that because repo.txt is very2 large size
(milions of lines,
with tag length  30 bp),  and my PC memory is too small to keep it.

- Gundala Viswanath
Jakarta - Indonesia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Carlos J. Gil Bellosta
On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
 Dear all,
 
 I have a repository file (let's call it repo.txt)
  that contain two columns like this:
 
 # tag  value
 AAA0.2
 AAT0.3
 AAC   0.02
 AAG   0.02
 ATA0.3
 ATT   0.7
 
 Given another query vector
 
  qr - c(AAC, ATT)
 
 I would like to find the corresponding value for each query above,
 yielding:
 
 0.02
 0.7
 
 However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
 Is there any ways to do that?
 
 The reason I want to do that because repo.txt is very2 large size
 (milions of lines,
 with tag length  30 bp),  and my PC memory is too small to keep it.
 
 - Gundala Viswanath
 Jakarta - Indonesia

Hello,

You can always store your repo.txt into a database, say, SQLite, and
select only the values you want via an SQL query.

Thus, you will prevent loading the full file into memory.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Wacek Kusnierczyk
you might try to iteratively read a limited number of line of lines in a
batch using readLines:

# filename, the name of your file
# n, the maximal count of lines to read in a batch
connection = file(filename, open=rt)
while (length(lines - readLines(con=connection, n=n))) {
   # do your stuff here
}
close(connection)

?file
?readLines

vQ


Gundala Viswanath wrote:
 Dear all,

 I have a repository file (let's call it repo.txt)
  that contain two columns like this:

 # tag  value
 AAA0.2
 AAT0.3
 AAC   0.02
 AAG   0.02
 ATA0.3
 ATT   0.7

 Given another query vector

   
 qr - c(AAC, ATT)
 

 I would like to find the corresponding value for each query above,
 yielding:

 0.02
 0.7

 However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
 Is there any ways to do that?

 The reason I want to do that because repo.txt is very2 large size
 (milions of lines,
 with tag length  30 bp),  and my PC memory is too small to keep it.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Gabor Grothendieck
The sqldf package can read a large file to a database without going
through R followed by extracting it.   The package makes it easier
to use RSQLite by setting up the database for you and after extracting
the portion you want removing the database automatically.  You can
specify all this in two lines: one to name the file and one to specify
the extraction using SQL. See the examples in example 6 on the
home page:

http://sqldf.googecode.com#Example_6._File_Input

On Fri, Jan 16, 2009 at 4:12 AM, Carlos J. Gil Bellosta
c...@datanalytics.com wrote:
 On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
 Dear all,

 I have a repository file (let's call it repo.txt)
  that contain two columns like this:

 # tag  value
 AAA0.2
 AAT0.3
 AAC   0.02
 AAG   0.02
 ATA0.3
 ATT   0.7

 Given another query vector

  qr - c(AAC, ATT)

 I would like to find the corresponding value for each query above,
 yielding:

 0.02
 0.7

 However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
 Is there any ways to do that?

 The reason I want to do that because repo.txt is very2 large size
 (milions of lines,
 with tag length  30 bp),  and my PC memory is too small to keep it.

 - Gundala Viswanath
 Jakarta - Indonesia

 Hello,

 You can always store your repo.txt into a database, say, SQLite, and
 select only the values you want via an SQL query.

 Thus, you will prevent loading the full file into memory.

 Best regards,

 Carlos J. Gil Bellosta
 http://www.datanalytics.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread r...@quantide.com

Something like this should work

library(R.utils)
out = numeric()
qr = c(AAC, ATT)
n =countLines(test.txt)
file = file(test.txt, r)
for (i in 1:n){
line = readLines(file, n = 1)
A = strsplit (line, split =  )[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split =  )[[1]][2])
out = c(out, value)
}
}

You may want to improve execution speed by reading data in chunks 
instead of line by line. Code requires a little modification





Carlos J. Gil Bellosta wrote:

On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath wrote:
  

Dear all,

I have a repository file (let's call it repo.txt)
 that contain two columns like this:

# tag  value
AAA0.2
AAT0.3
AAC   0.02
AAG   0.02
ATA0.3
ATT   0.7

Given another query vector



qr - c(AAC, ATT)
  

I would like to find the corresponding value for each query above,
yielding:

0.02
0.7

However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
Is there any ways to do that?

The reason I want to do that because repo.txt is very2 large size
(milions of lines,
with tag length  30 bp),  and my PC memory is too small to keep it.

- Gundala Viswanath
Jakarta - Indonesia



Hello,

You can always store your repo.txt into a database, say, SQLite, and
select only the values you want via an SQL query.

Thus, you will prevent loading the full file into memory.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Wacek Kusnierczyk
if the file is really large, reading it twice may add considerable penalty:

r...@quantide.com wrote:
 Something like this should work

 library(R.utils)
 out = numeric()
 qr = c(AAC, ATT)
 n =countLines(test.txt)

# 1st pass

 file = file(test.txt, r)
 for (i in 1:n){

# 2nd pass

 line = readLines(file, n = 1)
 A = strsplit (line, split =  )[[1]][1]
 if(is.element(A, qr)) {
 value = as.numeric(strsplit (line, split =  )[[1]][2])
 out = c(out, value)
 }
 }

if this is a one-go task, counting the lines does not pay, and why
bother.  if this is a repetitive task, a database-based solution will
probably be a better idea.

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread r...@quantide.com

I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could 
be a very time consuming task if someone is not familiar with database 
technology.


Using file() is not a real reading of all the file. This function will 
simply open a connection to the file without reading it.

countLines should do something lile wc -l from a bash shell

I would say that if this is a one time job this solution should work 
even thought is not the fastest. In case this job is a repetitive one, 
then a database solution is surely better


A.


Wacek Kusnierczyk wrote:

if the file is really large, reading it twice may add considerable penalty:

r...@quantide.com wrote:
  

Something like this should work

library(R.utils)
out = numeric()
qr = c(AAC, ATT)
n =countLines(test.txt)



# 1st pass

  

file = file(test.txt, r)
for (i in 1:n){



# 2nd pass

  

line = readLines(file, n = 1)
A = strsplit (line, split =  )[[1]][1]
if(is.element(A, qr)) {
value = as.numeric(strsplit (line, split =  )[[1]][2])
out = c(out, value)
}
}



if this is a one-go task, counting the lines does not pay, and why
bother.  if this is a repetitive task, a database-based solution will
probably be a better idea.

vQ




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Wacek Kusnierczyk
r...@quantide.com wrote:

 Using file() is not a real reading of all the file. This function will
 simply open a connection to the file without reading it.
 countLines should do something lile wc -l from a bash shell


just for a test:

cat(rep('', 10^7), file='test.txt', fill=1)
library(R.utils)
system.time(countLines('test.txt'))

... and the file is just about 30MB (and it makes no real difference if
it is stuffed with newlines or not).

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Wacek Kusnierczyk
r...@quantide.com wrote:
 I agree on the database solution.
 Database are the rigth tool to solve this kind of problem.
 Only consider the start up cost of setting up the database. This could
 be a very time consuming task if someone is not familiar with database
 technology.

and won't pay if you want to do the lookup just once.


 Using file() is not a real reading of all the file. This function will
 simply open a connection to the file without reading it.
 countLines should do something lile wc -l from a bash shell

... and wc knows the count of lines in a file without reading it

vQ

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Gabor Grothendieck
On Fri, Jan 16, 2009 at 5:52 AM, r...@quantide.com r...@quantide.com wrote:
 I agree on the database solution.
 Database are the rigth tool to solve this kind of problem.
 Only consider the start up cost of setting up the database. This could be a
 very time consuming task if someone is not familiar with database
 technology.

Using sqldf as mentioned previously on this thread allows one to use
the SQLite database with no setup at all.  sqldf automatically creates
the database, generates the record layout, loads the file (not going through
R but outside of R so R does not slow it down) and extracts the
portion you want into R issuing the appropriate calls to RSQLite/DBI and
destroying the database afterwards all automatically.  When you
install sqldf it automatically installs RSQLite and the SQLite database
itself so the entire installation is just one line: install.packages(sqldf)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Gundala Viswanath
Hi Gabor,

Do you mean storing data in sqldf', doesn't take memory?
For example, I have 3GB data file. with standard R object using read.table()
the object size will explode twice ~6GB. My current 4GB RAM
cannot handle that.

Do you mean with sqldf, this is not the issue?
Why is that?

Sorry for my naive question.

- Gundala Viswanath
Jakarta - Indonesia



On Fri, Jan 16, 2009 at 9:09 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, Jan 16, 2009 at 5:52 AM, r...@quantide.com r...@quantide.com wrote:
 I agree on the database solution.
 Database are the rigth tool to solve this kind of problem.
 Only consider the start up cost of setting up the database. This could be a
 very time consuming task if someone is not familiar with database
 technology.

 Using sqldf as mentioned previously on this thread allows one to use
 the SQLite database with no setup at all.  sqldf automatically creates
 the database, generates the record layout, loads the file (not going through
 R but outside of R so R does not slow it down) and extracts the
 portion you want into R issuing the appropriate calls to RSQLite/DBI and
 destroying the database afterwards all automatically.  When you
 install sqldf it automatically installs RSQLite and the SQLite database
 itself so the entire installation is just one line: install.packages(sqldf)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Gabor Grothendieck
Only the portion your extract is ever in R -- the file itself is read
into a database
without ever going through R so your memory requirements correspond to what
you extract, not the size of the file.

On Fri, Jan 16, 2009 at 10:49 AM, Gundala Viswanath gunda...@gmail.com wrote:
 Hi Gabor,

 Do you mean storing data in sqldf', doesn't take memory?
 For example, I have 3GB data file. with standard R object using read.table()
 the object size will explode twice ~6GB. My current 4GB RAM
 cannot handle that.

 Do you mean with sqldf, this is not the issue?
 Why is that?

 Sorry for my naive question.

 - Gundala Viswanath
 Jakarta - Indonesia



 On Fri, Jan 16, 2009 at 9:09 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 On Fri, Jan 16, 2009 at 5:52 AM, r...@quantide.com r...@quantide.com wrote:
 I agree on the database solution.
 Database are the rigth tool to solve this kind of problem.
 Only consider the start up cost of setting up the database. This could be a
 very time consuming task if someone is not familiar with database
 technology.

 Using sqldf as mentioned previously on this thread allows one to use
 the SQLite database with no setup at all.  sqldf automatically creates
 the database, generates the record layout, loads the file (not going through
 R but outside of R so R does not slow it down) and extracts the
 portion you want into R issuing the appropriate calls to RSQLite/DBI and
 destroying the database afterwards all automatically.  When you
 install sqldf it automatically installs RSQLite and the SQLite database
 itself so the entire installation is just one line: install.packages(sqldf)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Gundala Viswanath
Hi,

 Unless you specify an in-memory database the database is stored on disk.

Thanks for your explanation.
I just downloaded 'sqldf'.

Where can I find the option for that? In sqldf I can't see the command.

I looked at:
envir = parent.frame()

doesn't appear to be the one.

- Gundala Viswanath
Jakarta - Indonesia


 On Fri, Jan 16, 2009 at 10:59 AM, Gundala Viswanath gunda...@gmail.com 
 wrote:
 Hi Gabor,

 the file itself is read  into a database

 The above doesn't use RAM memory?

 Rgds,
 GV.

 without ever going through R so your memory requirements correspond to what
 you extract, not the size of the file.

 On Fri, Jan 16, 2009 at 10:49 AM, Gundala Viswanath gunda...@gmail.com 
 wrote:
 Hi Gabor,

 Do you mean storing data in sqldf', doesn't take memory?
 For example, I have 3GB data file. with standard R object using 
 read.table()
 the object size will explode twice ~6GB. My current 4GB RAM
 cannot handle that.

 Do you mean with sqldf, this is not the issue?
 Why is that?

 Sorry for my naive question.

 - Gundala Viswanath
 Jakarta - Indonesia



 On Fri, Jan 16, 2009 at 9:09 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 On Fri, Jan 16, 2009 at 5:52 AM, r...@quantide.com r...@quantide.com 
 wrote:
 I agree on the database solution.
 Database are the rigth tool to solve this kind of problem.
 Only consider the start up cost of setting up the database. This could 
 be a
 very time consuming task if someone is not familiar with database
 technology.

 Using sqldf as mentioned previously on this thread allows one to use
 the SQLite database with no setup at all.  sqldf automatically creates
 the database, generates the record layout, loads the file (not going 
 through
 R but outside of R so R does not slow it down) and extracts the
 portion you want into R issuing the appropriate calls to RSQLite/DBI and
 destroying the database afterwards all automatically.  When you
 install sqldf it automatically installs RSQLite and the SQLite database
 itself so the entire installation is just one line: 
 install.packages(sqldf)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Value Lookup from File without Slurping

2009-01-16 Thread Gabor Grothendieck
If that refers to using a database on disk to temporarily hold
the file then example 6 on the home page shows it, as mentioned,
and you may wish to look at the other examples there too and
there is further documentation in the ?sqldf help file.

On Fri, Jan 16, 2009 at 11:11 AM, Gundala Viswanath gunda...@gmail.com wrote:
 Hi,

 Unless you specify an in-memory database the database is stored on disk.

 Thanks for your explanation.
 I just downloaded 'sqldf'.

 Where can I find the option for that? In sqldf I can't see the command.

 I looked at:
 envir = parent.frame()

 doesn't appear to be the one.

 - Gundala Viswanath
 Jakarta - Indonesia


 On Fri, Jan 16, 2009 at 10:59 AM, Gundala Viswanath gunda...@gmail.com 
 wrote:
 Hi Gabor,

 the file itself is read  into a database

 The above doesn't use RAM memory?

 Rgds,
 GV.

 without ever going through R so your memory requirements correspond to what
 you extract, not the size of the file.

 On Fri, Jan 16, 2009 at 10:49 AM, Gundala Viswanath gunda...@gmail.com 
 wrote:
 Hi Gabor,

 Do you mean storing data in sqldf', doesn't take memory?
 For example, I have 3GB data file. with standard R object using 
 read.table()
 the object size will explode twice ~6GB. My current 4GB RAM
 cannot handle that.

 Do you mean with sqldf, this is not the issue?
 Why is that?

 Sorry for my naive question.

 - Gundala Viswanath
 Jakarta - Indonesia



 On Fri, Jan 16, 2009 at 9:09 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
 On Fri, Jan 16, 2009 at 5:52 AM, r...@quantide.com r...@quantide.com 
 wrote:
 I agree on the database solution.
 Database are the rigth tool to solve this kind of problem.
 Only consider the start up cost of setting up the database. This could 
 be a
 very time consuming task if someone is not familiar with database
 technology.

 Using sqldf as mentioned previously on this thread allows one to use
 the SQLite database with no setup at all.  sqldf automatically creates
 the database, generates the record layout, loads the file (not going 
 through
 R but outside of R so R does not slow it down) and extracts the
 portion you want into R issuing the appropriate calls to RSQLite/DBI and
 destroying the database afterwards all automatically.  When you
 install sqldf it automatically installs RSQLite and the SQLite database
 itself so the entire installation is just one line: 
 install.packages(sqldf)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.