2013/3/15 Ark <[email protected]>:
> writes:
>
>>
>> did you see my earlier reply?
>>
> Ah, you are right, sorry about that...any particular reason we reset the
> value?
This is explained in the changelog:
http://scikit-learn.org/dev/whats_new.html#changes-0-14
namely to be less confusing
writes:
>
> did you see my earlier reply?
>
Ah, you are right, sorry about that...any particular reason we reset the
value?
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
did you see my earlier reply?
Roman Sinayev schrieb:
>min_df=2 in the second and min_df=1 in the first.
>
>On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote:
>>
>>>
>>> This is unexpected. Can you inspect the vocabulary_ on both
>>> vectorizers? Try computing their set.intersectio
min_df=2 in the second and min_df=1 in the first.
On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote:
>
>>
>> This is unexpected. Can you inspect the vocabulary_ on both
>> vectorizers? Try computing their set.intersection, set.difference,
>> set.symmetric_difference (all Python builti
>
> This is unexpected. Can you inspect the vocabulary_ on both
> vectorizers? Try computing their set.intersection, set.difference,
> set.symmetric_difference (all Python builtins).
>
In [17]: len(set.symmetric_difference(set(vect13.vocabulary_.keys()),
set(vect14.vocabulary_.keys(
Out[17
2013/3/14 Ark <[email protected]>:
> For:
> vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2),
> smooth_idf=True, sublinear_tf=True, max_df=0.5,
> token_pattern=ur'\b(?!\d)\w\w+\b'))
>
> On fit_transform the shape of the input data
> - with version 0.13.
This is weird. Are you sure it is not the other way around?
The min_df parameter was reset from 2 to 1 afaik, which should give you
a larger vocabulary
in the git version, not a smaller.
On 03/14/2013 04:11 AM, Ark wrote:
> The vectorized input with the same training data set differs with versio
The vectorized input with the same training data set differs with versions
0.13.1
and 0.14-git.
For:
vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2),
smooth_idf=True, sublinear_tf=True, max_df=0.5,
token_pattern=ur'\b(?!\d)\w\w+\b'))
On fit_transf