2013/6/2 Harold Nguyen <har...@nexgate.com>:
> http://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
> Does TfidfVectorizer take a sequence of filenames, where each file is just a
> plain text file ?

Depends on the parameter input (the first in the list). In the
example, I set it to 'filename'.

> Also, according to the link, I thought min_df only lives in the interval
> [0.0, 1.0] or did I misunderstand ? If it's an "int" does that recommend the
> number of occurrences rather than the frequency ?

This is described in the docs (in somewhat poor grammar) as "If float,
the parameter represents a proportion of documents, integer absolute
counts." So yes, if min_df=2, that means at least two documents must
contain a term for it to become a feature in X.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to