Hi,

I am using TfidfVectorizer of sklearn.feature_extraction.text for
generating tf-idf matrix of a corpus. However, when I look at the features
extracted from my corpus it seems that it has reduced my vocabulary size
from 1860 to 598! I tried to play with max_df, min_df, and max_features.
But nothing changed.

tf = TfidfVectorizer(ngram_range=(1,ngram),use_idf=False)
tf_matrix =  tf.fit_transform(corpus)
feature_names = tf.get_feature_names()

Does someone have an idea how to solve this problem?

Thank you,

Ehsan
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to