Re: [Scikit-learn-general] Text Documents - Vectorizer

Lars Buitinck Fri, 23 Mar 2012 06:07:27 -0700

Op 23 maart 2012 13:58 heeft Olivier Grisel <olivier.gri...@ensta.org>
het volgende geschreven:
> Le 23 mars 2012 13:27, Philipp Singer <kill...@gmail.com> a écrit :
>> Okay, so the tfidf values are for the whole corpus.
>
> Well not exactly: the IDF weights are "trained" on the training slice
> of the corpus and can then be reused for the new data from the test
> corpus.


In a transductive setting (with a fixed test set, i.e. no "unseen"
samples) it might make sense to fit idf on the whole corpus.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Text Documents - Vectorizer

Reply via email to