Re: [R] read multiple large files into one dataframe

2010-01-25 Thread Paul Hiemstra

Brad Patrick Schneid wrote:
###  The following is very helpful # 
listOfFiles - list.files(pattern= .txt) 
d - do.call(rbind, lapply(listOfFiles, read.table)) 
###


but what if each file contains information corresponding to a different
subject and I need to be able to tell where each row came from?  i.e.: I
need a new row 

a new column I presume, not a row

that repeats the original filename for each observation of
the former respective files.

Any ideas? 



  

listOfFiles - list.files(pattern= .txt)
d - do.call(rbind, lapply(listOfFiles, read.table))

you replace the read.table function by a custom function:

listOfFiles - list.files(pattern= .txt)
d - do.call(rbind, lapply(listOfFiles, function(fname) {
  dum = read.table(fname)
  dum$which_file = fname
  return(dum)
   }))

Now d has an additional column identifying which filename it originally  
belonged to.


cheers,
Paul

--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2010-01-25 Thread hadley wickham
On Mon, Jan 25, 2010 at 4:43 AM, Paul Hiemstra p.hiems...@geo.uu.nl wrote:
 Brad Patrick Schneid wrote:

 ###  The following is very helpful # listOfFiles -
 list.files(pattern= .txt) d - do.call(rbind, lapply(listOfFiles,
 read.table)) ###

 but what if each file contains information corresponding to a different
 subject and I need to be able to tell where each row came from?  i.e.: I
 need a new row

 a new column I presume, not a row

 that repeats the original filename for each observation of
 the former respective files.

 Any ideas?



 listOfFiles - list.files(pattern= .txt)
 d - do.call(rbind, lapply(listOfFiles, read.table))

Or use the plyr package:

listOfFiles - list.files(pattern= .txt)
names(listOfFiles) - basename(listOfFiles)

d - ldply(listOfFiles, read.table)

See http://had.co.nz/plyr for more info.

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2010-01-25 Thread Brad Patrick Schneid

Thats it Hadley!!!
Thank you.
-- 
View this message in context: 
http://n4.nabble.com/read-multiple-large-files-into-one-dataframe-tp891835p1290089.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2010-01-24 Thread Brad Patrick Schneid

###  The following is very helpful # 
listOfFiles - list.files(pattern= .txt) 
d - do.call(rbind, lapply(listOfFiles, read.table)) 
###

but what if each file contains information corresponding to a different
subject and I need to be able to tell where each row came from?  i.e.: I
need a new row that repeats the original filename for each observation of
the former respective files.

Any ideas? 


-- 
View this message in context: 
http://n4.nabble.com/read-multiple-large-files-into-one-dataframe-tp891835p1288816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read multiple large files into one dataframe

2009-05-13 Thread SYKES, Jennifer
Hello

 

Apologies if this is a simple question, I have searched the help and
have not managed to work out a solution.

Does anybody know an efficient method for reading many text files of the
same format into one table/dataframe?

 

I have around 90 files that contain continuous data over 3 months but
that are split into individual days data and I need the whole 3 months
in one file for analysis.  Each days file contains a large amount of
data (approx 30MB each) and so I need a memory efficient method to merge
all of the files into the one dataframe object.  From what I have read I
will probably want to avoid using for loops etc?  All files are in the
same directory, none have a header row, and each contain around 180,000
rows and the same 25 columns/variables.  Any suggested packages/routines
would be very useful.

 

Thanks

 

Jennifer

 

 



-
***If
you are not the intended recipient, please notify our Help Desk at
Email postmas...@nats.co.uk immediately. You should not copy or use
this email or attachment(s) for any purpose nor disclose their
contents to any other person. NATS computer systems may be
monitored and communications carried on them recorded, to secure
the effective operation of the system and for other lawful
purposes. Please note that neither NATS nor the sender accepts any
responsibility for viruses or any losses caused as a result of
viruses and it is your responsibility to scan or otherwise check
this email and any attachments. NATS means NATS (En Route) plc
(company number: 4129273), NATS (Services) Ltd (company number
4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
(company number 3155567) or NATS Holdings Ltd (company number
4138218). All companies are registered in England and their
registered office is at 5th Floor, Brettenham House South,
Lancaster Place, London, WC2E 7EN.
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread baptiste auguie

I'd first try plyr and see if it's efficient enough,


library(plyr)

listOfFiles - list.files(pattern= .txt)

d - ldply(listOfFiles, read.table)
str(d)



alternatively,


d - do.call(rbind, lapply(listOfFiles, read.table))



HTH,

baptiste


On 13 May 2009, at 12:45, SYKES, Jennifer wrote:


Hello



Apologies if this is a simple question, I have searched the help and
have not managed to work out a solution.

Does anybody know an efficient method for reading many text files of  
the

same format into one table/dataframe?



I have around 90 files that contain continuous data over 3 months but
that are split into individual days data and I need the whole 3 months
in one file for analysis.  Each days file contains a large amount of
data (approx 30MB each) and so I need a memory efficient method to  
merge
all of the files into the one dataframe object.  From what I have  
read I

will probably want to avoid using for loops etc?  All files are in the
same directory, none have a header row, and each contain around  
180,000
rows and the same 25 columns/variables.  Any suggested packages/ 
routines

would be very useful.



Thanks



Jennifer







-
***If
you are not the intended recipient, please notify our Help Desk at
Email postmas...@nats.co.uk immediately. You should not copy or use
this email or attachment(s) for any purpose nor disclose their
contents to any other person. NATS computer systems may be
monitored and communications carried on them recorded, to secure
the effective operation of the system and for other lawful
purposes. Please note that neither NATS nor the sender accepts any
responsibility for viruses or any losses caused as a result of
viruses and it is your responsibility to scan or otherwise check
this email and any attachments. NATS means NATS (En Route) plc
(company number: 4129273), NATS (Services) Ltd (company number
4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
(company number 3155567) or NATS Holdings Ltd (company number
4138218). All companies are registered in England and their
registered office is at 5th Floor, Brettenham House South,
Lancaster Place, London, WC2E 7EN.
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


_

Baptiste Auguié

School of Physics
University of Exeter
Stocker Road,
Exeter, Devon,
EX4 4QL, UK

Phone: +44 1392 264187

http://newton.ex.ac.uk/research/emag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Mike Lawrence
What types of data are in each file? All numbers, or a mix of numbers
and characters? Any missing data or special NA values?

On Wed, May 13, 2009 at 7:45 AM, SYKES, Jennifer
jennifer.sy...@nats.co.uk wrote:
 Hello



 Apologies if this is a simple question, I have searched the help and
 have not managed to work out a solution.

 Does anybody know an efficient method for reading many text files of the
 same format into one table/dataframe?



 I have around 90 files that contain continuous data over 3 months but
 that are split into individual days data and I need the whole 3 months
 in one file for analysis.  Each days file contains a large amount of
 data (approx 30MB each) and so I need a memory efficient method to merge
 all of the files into the one dataframe object.  From what I have read I
 will probably want to avoid using for loops etc?  All files are in the
 same directory, none have a header row, and each contain around 180,000
 rows and the same 25 columns/variables.  Any suggested packages/routines
 would be very useful.



 Thanks



 Jennifer







 -
 ***If
 you are not the intended recipient, please notify our Help Desk at
 Email postmas...@nats.co.uk immediately. You should not copy or use
 this email or attachment(s) for any purpose nor disclose their
 contents to any other person. NATS computer systems may be
 monitored and communications carried on them recorded, to secure
 the effective operation of the system and for other lawful
 purposes. Please note that neither NATS nor the sender accepts any
 responsibility for viruses or any losses caused as a result of
 viruses and it is your responsibility to scan or otherwise check
 this email and any attachments. NATS means NATS (En Route) plc
 (company number: 4129273), NATS (Services) Ltd (company number
 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
 (company number 3155567) or NATS Holdings Ltd (company number
 4138218). All companies are registered in England and their
 registered office is at 5th Floor, Brettenham House South,
 Lancaster Place, London, WC2E 7EN.
 **

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Simon Pickett

can you provide reproducible code please?

even a fake example would help.

I would

1) set up a loop to read in each file from a directory
2)  inside the loop chop up/ aggregate the data, each file in turn and spit 
each new aggreagated file out to a directory using write.table(). This will 
reduce the memory needed by only including the info you want. Make sure each 
file is a data frame with the same names.
3) set up a new loop to read in each new small file and rbind them all 
together to make your new master file.


The R gurus may have a more parsimonious solution.

HTH

Simon.


- Original Message - 
From: SYKES, Jennifer jennifer.sy...@nats.co.uk

To: r-help@r-project.org
Sent: Wednesday, May 13, 2009 11:45 AM
Subject: [R] read multiple large files into one dataframe



Hello



Apologies if this is a simple question, I have searched the help and
have not managed to work out a solution.

Does anybody know an efficient method for reading many text files of the
same format into one table/dataframe?



I have around 90 files that contain continuous data over 3 months but
that are split into individual days data and I need the whole 3 months
in one file for analysis.  Each days file contains a large amount of
data (approx 30MB each) and so I need a memory efficient method to merge
all of the files into the one dataframe object.  From what I have read I
will probably want to avoid using for loops etc?  All files are in the
same directory, none have a header row, and each contain around 180,000
rows and the same 25 columns/variables.  Any suggested packages/routines
would be very useful.



Thanks



Jennifer







-
***If
you are not the intended recipient, please notify our Help Desk at
Email postmas...@nats.co.uk immediately. You should not copy or use
this email or attachment(s) for any purpose nor disclose their
contents to any other person. NATS computer systems may be
monitored and communications carried on them recorded, to secure
the effective operation of the system and for other lawful
purposes. Please note that neither NATS nor the sender accepts any
responsibility for viruses or any losses caused as a result of
viruses and it is your responsibility to scan or otherwise check
this email and any attachments. NATS means NATS (En Route) plc
(company number: 4129273), NATS (Services) Ltd (company number
4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
(company number 3155567) or NATS Holdings Ltd (company number
4138218). All companies are registered in England and their
registered office is at 5th Floor, Brettenham House South,
Lancaster Place, London, WC2E 7EN.
**

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Liaw, Andy
A few points to consider:

- If all the data are numeric, then use matrices instead of data frames.

- With either data frames or matrices, there is no way (that I'm aware
of anyway) in R to stack them without making at least one copy in
memory.

- Since none of the files has a header row, I would concatenate them
into one file outside R (e.g., on *nix, cat *  all.txt) and then read
that in.  You can also try it inside R with something like
read.table(pipe()).  You will want to make use of the colClasses
argument in read.table() to specify the column types, though, to ensure
that read.table() only go through the input once.

- You're probably better off getting the data into a database (even
something like sqlite) and use an R interface to that database.

- 30MB x 90 = 2.7GB.  Unless you're on a 64-bit machine with lots of
RAM, you're not likely to have much fun with the data even when you
manage to get it into R in one piece.

Andy

From: SYKES, Jennifer
 
 Hello
 
  
 
 Apologies if this is a simple question, I have searched the help and
 have not managed to work out a solution.
 
 Does anybody know an efficient method for reading many text 
 files of the
 same format into one table/dataframe?
 
  
 
 I have around 90 files that contain continuous data over 3 months but
 that are split into individual days data and I need the whole 3 months
 in one file for analysis.  Each days file contains a large amount of
 data (approx 30MB each) and so I need a memory efficient 
 method to merge
 all of the files into the one dataframe object.  From what I 
 have read I
 will probably want to avoid using for loops etc?  All files are in the
 same directory, none have a header row, and each contain 
 around 180,000
 rows and the same 25 columns/variables.  Any suggested 
 packages/routines
 would be very useful.
 
  
 
 Thanks
 
  
 
 Jennifer
 
  
 
  
 
 
 
 -
 ***If
 you are not the intended recipient, please notify our Help Desk at
 Email postmas...@nats.co.uk immediately. You should not copy or use
 this email or attachment(s) for any purpose nor disclose their
 contents to any other person. NATS computer systems may be
 monitored and communications carried on them recorded, to secure
 the effective operation of the system and for other lawful
 purposes. Please note that neither NATS nor the sender accepts any
 responsibility for viruses or any losses caused as a result of
 viruses and it is your responsibility to scan or otherwise check
 this email and any attachments. NATS means NATS (En Route) plc
 (company number: 4129273), NATS (Services) Ltd (company number
 4129270), NATSNAV Ltd (company number: 4164590) or NATS Ltd
 (company number 3155567) or NATS Holdings Ltd (company number
 4138218). All companies are registered in England and their
 registered office is at 5th Floor, Brettenham House South,
 Lancaster Place, London, WC2E 7EN.
 **
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
Notice:  This e-mail message, together with any attachme...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.