try this: dtm <- DocumentTermMatrix(examplecorpus, control = list(wordLengths=c(1,100)))
On Wed, May 16, 2012 at 6:22 AM, C.H. <[email protected]> wrote: > Dear All, > > The following code illustrate the problem. > > [R code] > require(tm) > exampledoc <- c("R is good", "R is really good") > examplecorpus <- Corpus(VectorSource(exampledoc), encoding = "UTF-8") > dtm <- DocumentTermMatrix(examplecorpus, control = list(minWordLength = 1)) > as.matrix(dtm) > [/R code] > > The term "R" and "is" were not included in the dtm even the control > parameter minWordLength was set to 1. > > Terms > Docs good really > 1 1 0 > 2 1 1 > > Would you reproduce this problem? > > The following is my sessionInfo > >> sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] tm_0.5-7.1 > > loaded via a namespace (and not attached): > [1] compiler_2.15.0 slam_0.1-23 tools_2.15.0 > > Regards, > > CH > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

