Re: [Scikit-learn-general] [TfidfVectorizer problem]

Andreas Mueller Thu, 19 Nov 2015 08:57:07 -0800

Hi Ehsan.
Which version of scikit-learn are you using?
And why do you think the vocabulary size is 1860?
What is len(tf.vocabulary_)?


Andy

On 11/18/2015 11:45 PM, Ehsan Asgari wrote:

Hi,
I am using TfidfVectorizer of sklearn.feature_extraction.text forgenerating tf-idf matrix of a corpus. However, when I look at thefeatures extracted from my corpus it seems that it has reduced myvocabulary size from 1860 to 598! I tried to play with max_df, min_df,and max_features. But nothing changed.
|tf = TfidfVectorizer(ngram_range=(1,ngram),use_idf=False) tf_matrix =tf.fit_transform(corpus) feature_names = tf.get_feature_names() |
Does someone have an idea how to solve this problem?

Thank you,

Ehsan




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [TfidfVectorizer problem]

Reply via email to