I learned how to use scikit's text feature extraction from the tutorial on
topic clustering with NMF:
http://scikit-learn.sourceforge.net/dev/auto_examples/applications/topics_extraction_with_nmf.html

Gael: I find that some of the normalizations in the text feature extractor
(such as tf-idf) are not specific to working with documents and text.  I
have used them for clustering other types of data.  I was happy to find the
tools in scikit-learn.

Conrad

On Fri, Sep 30, 2011 at 4:41 PM, Jacob VanderPlas <
[email protected]> wrote:

> Oliver, thanks for the link: I missed the info in the tutorial.  We
> should think about adding a link to the tutorial at the top of the main
> documentation page.
> Vlad, Xinfan - thanks for stepping up to work on this!
>   Jake
>
> Olivier Grisel wrote:
> > 2011/9/30 Jacob VanderPlas <[email protected]>:
> >
> >> Hi all,
> >> I spent a half hour last night trying to understand the text feature
> >> extractors in sklearn.feature_extraction.text.  I frankly got nowhere:
> >> it is woefully under-documented, both in doc-strings and the online
> >> documentation.  Is there anybody who has a familiarity with these
> >> routines and would be willing to spend some time on the docs?  That
> >> would be a huge contribution to the usability of scikit-learn.  Thanks
> >>   Jake
> >>
> >
> > I agree and I am very sorry for that. In the short term the best
> > source of documentation is the tutorial:
> >
> >
> http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html
> >
> > It's still on my todo list to work on simplifying the current API and
> > documenting it correctly in the reference documentation.
> >
> >
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to