That is the minSupport argument. seq2sparse is
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles which
calls org.apache.mahout.vectorizer.DictionaryVectorizer. Look for
'minSupport' and you'll see how it works.



On Wed, Mar 7, 2012 at 12:34 PM, Baoqiang Cao <[email protected]> wrote:
> Hi,
>
> I wonder if in seq2sparse step I could set a criteria for the minimum
> number of words (after stop words) a document must have. Any help,
> please?
>
> Best,
> Bao



-- 
Lance Norskog
[email protected]

Reply via email to