Op 23 maart 2012 13:58 heeft Olivier Grisel <olivier.gri...@ensta.org> het volgende geschreven: > Le 23 mars 2012 13:27, Philipp Singer <kill...@gmail.com> a écrit : >> Okay, so the tfidf values are for the whole corpus. > > Well not exactly: the IDF weights are "trained" on the training slice > of the corpus and can then be reused for the new data from the test > corpus.
In a transductive setting (with a fixed test set, i.e. no "unseen" samples) it might make sense to fit idf on the whole corpus. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general