did you see my earlier reply?
Roman Sinayev <[email protected]> schrieb:
>min_df=2 in the second and min_df=1 in the first.
>
>On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote:
>>
>>>
>>> This is unexpected. Can you inspect the vocabulary_ on both
>>> vectorizers? Try computing their set.intersection, set.difference,
>>> set.symmetric_difference (all Python builtins).
>>>
>>
>> In [17]: len(set.symmetric_difference(set(vect13.vocabulary_.keys()),
>> set(vect14.vocabulary_.keys())))
>> Out[17]: 42529
>>
>> I skimmed over the list of the keys and the values seem to be what
>should be in
>> the document. so I am not sure why exactly I did not have them
>earlier; will
>> continue to analyze if I see any discrepancy.
>>
>> For clarity I am adding the complete vectorizer:
>>
>> with scikit 0.14-git:
>> Extracting features from dataset using TfidfVectorizer(analyzer=word,
>> binary=False, charset=utf-8,
>> charset_error=strict, dtype=<type 'numpy.int64'>,
>input=content,
>> lowercase=True, max_df=0.5, max_features=None, min_df=1,
>> ngram_range=(1, 2), norm=l2, preprocessor=None,
>smooth_idf=True,
>> stop_words=english, strip_accents=None, sublinear_tf=True,
>> token_pattern=\b(?!\d)\w\w+\b, tokenizer=None, use_idf=False,
>> vocabulary=None).
>>
>> with scikit 0.13
>> Extracting features from dataset using TfidfVectorizer(analyzer=word,
>> binary=False, charset=utf-8,
>> charset_error=strict, dtype=<type 'long'>, input=content,
>> lowercase=True, max_df=0.5, max_features=None, max_n=None,
>> min_df=2, min_n=None, ngram_range=(1, 2), norm=l2,
>> preprocessor=None, smooth_idf=True, stop_words=english,
>> strip_accents=None, sublinear_tf=True,
>> token_pattern=\b(?!\d)\w\w+\b, tokenizer=None, use_idf=False,
>> vocabulary=None)
>>
>>
>>
>------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_d2d_mar
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>------------------------------------------------------------------------------
>Everyone hates slow websites. So do we.
>Make your web apps faster with AppDynamics
>Download AppDynamics Lite for free today:
>http://p.sf.net/sfu/appdyn_d2d_mar
>_______________________________________________
>Scikit-learn-general mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general