Re: [Scikit-learn-general] tf-idf changes

Jaques Grobler Tue, 27 Mar 2012 02:11:23 -0700

Thanks a lot. I've let the author know

J


Le 26 mars 2012 14:14, Jaques Grobler <[email protected]> a ?crit :
>
> > Hi everyone-
>
> >
>
> > I stumbled upon this post that offers a quick run-trough of
>
> > text-feature-extraction using
>
> > sklearn.feature_extraction.text's?CountVectorizer:
>
> >
>
> >
>
> > http://pyevolve.sourceforge.net/wordpress/?p=1589&cpage=1#comment-15857
>
> >
>
> > Upon copying the code into ipython, ?i get different outputs from him. It
>
> > appears as though there have been
>
> > changes to this module since he made this post, but I don't see anything
>> in
>
> > the change-log, unless i'm missing it.
>
>
>> The module has been completely refactored in master as stated in the
>> changelog:
>
>
>> http://scikit-learn.org/dev/whats_new.html
>
>
>> > Just want to give the guy a heads-up about it. Can anyone point me in a
>
> > direction or help here?
>
>
>> In particular the IDF smoothing used to cause negative values for
>
> highly frequent words (very rare in practice). The fix that I used
>
> makes all IDF values larger than 1.0. This might not be as canonical
>
> as it should be but I tried other alternatives and they tended to
>
> decrease the quality of the KMeans results in the text clustering
>
> example...
>
>
>> Also the fitted vocabulary has been renamed to vocabulary_ to respect
>
> the fit semantics of the rest of the project.
>
>
>> --
>
> Olivier
>
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] tf-idf changes

Reply via email to