Do you mean when you encounter a new term? I would think document *length* wouldn't matter; presumably you have a list of terms already. If so you could treat each document as a vector of term codes, then use "tabulate" to get the column for that document.
If you're using all terms that appear in any document, and you don't want to compile a list of terms first, then you might want to think of creating a sparse representation as in the sparseM package and using the sparse linear algebra routines there. Just an idea, though. Reid Huntsinger -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ryan Steckel Sent: Thursday, March 17, 2005 6:01 PM To: [email protected] Subject: [R] TD Matrix I'm trying to create a term document matrix where the columns are the documents, the rows are the terms in the documents, and the cells are a weight of term frequency in the document. My problem is the documents are all different lengths. So when I add a new document, if the document length is greater than the max document length in the matrix, I have to resize the matrix and do a cbind operation. Does anyone know of an easier way? ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
