Hello!

I have trouble running the example "seq2sparse" with TFIDF weights. My TF
vectors are Ok, while TFIDF vectors are 10 times smaller. Looks like
seq2sparse cuts my terms during TFxIDF step. Document1 in TF vector has 20
terms, while Document1 in TFIDF vector
 has only 2 terms. What is wrong? I spent 2 days finding the answer and
configuring seq2sparse parameters ((

Thanks in advance!

mahout seq2sparse -ow  \
-chunk 512 \
--maxDFPercent 90 \
--maxNGramSize 1 \
--numReducers 128 \
--minSupport 150 \
-i --- \
-o --- \
-wt tfidf \
--namedVector \
-a org.apache.lucene.analysis.WhitespaceAnalyzer

Pavel

Reply via email to