Yes. Text features extraction is inherently difficult, given so many
possible features ( words, n-grams, POS tags etc.) and so many possible
weighting scheme ((log, tf-idf, tf, etc.). Even when I use
feature_extraction.text for a while, every time I want to use it again I
would refer to my old programs and copy some snippets from them.

So my proposal is that: besides official documentations, it would be helpful
to develop some typical scenario of using this module. The
code illustrating this scenario could be as small as several lines of codes.
I have developed several such codes and would like to contribute it if it is
useful. Of course, these codes are not as high quality as the official ones.
So I think it would be more appropriate to  create some "external
documents" web pages to host links to those small snippets.

On Sat, Oct 1, 2011 at 12:18 AM, Mathieu Blondel <[email protected]>wrote:

> On Sat, Oct 1, 2011 at 12:39 AM, Gael Varoquaux
> <[email protected]> wrote:
> > Yes. In addition, undertested and breaks the API of the scikit. The
> reason is that it doesn't get used much as is developped only by a small
> fraction of the development team.
>
> Most modules are developed by 2-3 people.
>
> Design a good text feature extraction API is surprisingly hard but
> it's a useful component of the scikit.
>
> Mathieu
>
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Best Wishes
--------------------------------------------
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to