I see the same phenomenon.
Basically, tokenization and tf vector creation goes fine, but
generating the tfidf vectors from the tf ones filters out many terms.
In an earlier experiment, (without setting any parameter except -nv) I
saw that terms with tf <= 2.0 were filtered out.
In my current experiment I cannot detect such a pattern.
Please tell me whether I am missing something here.
Thanks,
Yuval

Reply via email to