Re: [R] How to read plain text documents into a vector?

2009-10-14 Thread kenhorvath





Richard Liu wrote:
 
 
 I tried the other two suggestions, but paste seemed not to glue the
 separate lines together into one character string.  Perhaps I missed
 something (collapse?).  Perhaps I'll have another look.
 
 

Yes, that is what 'collapse' should do! If you read text using readLines R
makes every line of the original document into an element of a character
vector, so a text with 30 lines would end up as vector with 30 elements. To
have one vector element per document, you need to collapse these, say, 30
elements into a single one - that is what collapse does. The value you
assign to collapse is the character (sequence) R puts between the single
elements. If you do not need to preserve paragraph structure, a single white
space is the logical choice (collapse =  ). (Paste just turns an object
into a character object - so using paste alone on the vector produced by
readLines would be meaningless, using collapse is the whole point here.)

Worked fine with me - did you get an error message or did it just not yield
the result you'd expected?



Dieter Menne wrote:
 
 
 library(tm)
 filenames = list.files(path=.,pattern=\\.txt)
 docs = 
 for (filename in filenames){
   docs = c(docs,paste(readLines(file(filename)),collapse=\n))
 }
 docs
 ## continue as in example
 vs = VectorSource(docs)
 
 

If in any way possible I would recommend to do the whole procedure via
lists, not recursively. Since readLines produces a vector and a list is, in
this case, a vector of vectors, it should be no problem. 

Ken
-- 
View this message in context: 
http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25886956.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read plain text documents into a vector?

2009-10-14 Thread kenhorvath


Dieter Menne wrote:
 
 
 While I agree that the appending could be more efficiently be done by a
 list as an intermediate, the 
 
 docs = c(doc, ljljljl)
 
 construct is not recursive, even if not efficient.
 
 
 

Yes, of course, that was hastily written, sorry ... but from my experience
list is really more efficient. 

Ken
-- 
View this message in context: 
http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25887181.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to read plain text documents into a vector?

2009-10-13 Thread kenhorvath



Paul Hiemstra wrote:
 
 
 file_list = list.files(/where/are/the/files)
 obj_list = lapply(file_list, FUN = yourfunction)
 
 yourfunction is probably either read.table or some read function from 
 the tm package. So obj_list will become a list of either data.frame's or 
 tm objects.
 
 

The read function that most probably should be adequate is readLines(), so
the command would read:

obj_list - lapply(file_list,readLines)

To convert to a vector, do the following:

obj_list - lapply(obj_list,paste,collapse= )
obj_vec - as.vector(obj_list)

Ken

-- 
View this message in context: 
http://www.nabble.com/How-to-read-plain-text-documents-into-a-vector--tp25867792p25870485.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.