I see the same phenomenon. Basically, tokenization and tf vector creation goes fine, but generating the tfidf vectors from the tf ones filters out many terms. In an earlier experiment, (without setting any parameter except -nv) I saw that terms with tf <= 2.0 were filtered out. In my current experiment I cannot detect such a pattern. Please tell me whether I am missing something here. Thanks, Yuval
- TFIDF values from seq2sparse Mark Bittmann
- Re: TFIDF values from seq2sparse Yuval Feinstein
- Re: TFIDF values from seq2sparse Yuval Feinstein
- Re: TFIDF values from seq2sparse Sean Owen
