Re: [Scikit-learn-general] Vectorizing input

2013-03-15 Thread Olivier Grisel
2013/3/15 Ark <[email protected]>: > writes: > >> >> did you see my earlier reply? >> > Ah, you are right, sorry about that...any particular reason we reset the > value? This is explained in the changelog: http://scikit-learn.org/dev/whats_new.html#changes-0-14 namely to be less confusing

Re: [Scikit-learn-general] Vectorizing input

2013-03-15 Thread Ark
writes: > > did you see my earlier reply? > Ah, you are right, sorry about that...any particular reason we reset the value? -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread amueller
did you see my earlier reply? Roman Sinayev schrieb: >min_df=2 in the second and min_df=1 in the first. > >On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote: >> >>> >>> This is unexpected. Can you inspect the vocabulary_ on both >>> vectorizers? Try computing their set.intersectio

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Roman Sinayev
min_df=2 in the second and min_df=1 in the first. On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote: > >> >> This is unexpected. Can you inspect the vocabulary_ on both >> vectorizers? Try computing their set.intersection, set.difference, >> set.symmetric_difference (all Python builti

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Ark
> > This is unexpected. Can you inspect the vocabulary_ on both > vectorizers? Try computing their set.intersection, set.difference, > set.symmetric_difference (all Python builtins). > In [17]: len(set.symmetric_difference(set(vect13.vocabulary_.keys()), set(vect14.vocabulary_.keys( Out[17

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Lars Buitinck
2013/3/14 Ark <[email protected]>: > For: > vectorizer = TfidfVectorizer(stop_words='english', ngram_range=(1,2), > smooth_idf=True, sublinear_tf=True, max_df=0.5, > token_pattern=ur'\b(?!\d)\w\w+\b')) > > On fit_transform the shape of the input data > - with version 0.13.

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread Andreas Mueller
This is weird. Are you sure it is not the other way around? The min_df parameter was reset from 2 to 1 afaik, which should give you a larger vocabulary in the git version, not a smaller. On 03/14/2013 04:11 AM, Ark wrote: > The vectorized input with the same training data set differs with versio