min_df=2 in the second and min_df=1 in the first. On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote: > >> >> This is unexpected. Can you inspect the vocabulary_ on both >> vectorizers? Try computing their set.intersection, set.difference, >> set.symmetric_difference (all Python builtins). >> > > In [17]: len(set.symmetric_difference(set(vect13.vocabulary_.keys()), > set(vect14.vocabulary_.keys()))) > Out[17]: 42529 > > I skimmed over the list of the keys and the values seem to be what should be > in > the document. so I am not sure why exactly I did not have them earlier; will > continue to analyze if I see any discrepancy. > > For clarity I am adding the complete vectorizer: > > with scikit 0.14-git: > Extracting features from dataset using TfidfVectorizer(analyzer=word, > binary=False, charset=utf-8, > charset_error=strict, dtype=<type 'numpy.int64'>, input=content, > lowercase=True, max_df=0.5, max_features=None, min_df=1, > ngram_range=(1, 2), norm=l2, preprocessor=None, smooth_idf=True, > stop_words=english, strip_accents=None, sublinear_tf=True, > token_pattern=\b(?!\d)\w\w+\b, tokenizer=None, use_idf=False, > vocabulary=None). > > with scikit 0.13 > Extracting features from dataset using TfidfVectorizer(analyzer=word, > binary=False, charset=utf-8, > charset_error=strict, dtype=<type 'long'>, input=content, > lowercase=True, max_df=0.5, max_features=None, max_n=None, > min_df=2, min_n=None, ngram_range=(1, 2), norm=l2, > preprocessor=None, smooth_idf=True, stop_words=english, > strip_accents=None, sublinear_tf=True, > token_pattern=\b(?!\d)\w\w+\b, tokenizer=None, use_idf=False, > vocabulary=None) > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_mar > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
