2012/2/13 Andreas <[email protected]>: > On 02/13/2012 09:49 PM, Lars Buitinck wrote: >> I verified that the features coming from text.Vectorizer are >> normalized; they're all in the range [-1, 1]. >> > I guess that is not the problem here but chi2 is only defined for > positive input, right?
Strictly, yes, so I should have turned idf weighting off, but chi² seems to be handling negative numbers without crashing. After chi², all feature values are in the range [0, 1], as can be expected. I've been planning to look into tfidf for quite some time, to see whether I can get it to always return positive numbers (there's been some discussion on the ML about this before and there's an open issue for it). -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
