That is the minSupport argument. seq2sparse is org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles which calls org.apache.mahout.vectorizer.DictionaryVectorizer. Look for 'minSupport' and you'll see how it works.
On Wed, Mar 7, 2012 at 12:34 PM, Baoqiang Cao <[email protected]> wrote: > Hi, > > I wonder if in seq2sparse step I could set a criteria for the minimum > number of words (after stop words) a document must have. Any help, > please? > > Best, > Bao -- Lance Norskog [email protected]
