2012/2/13 Andreas <[email protected]>:
> On 02/13/2012 09:49 PM, Lars Buitinck wrote:
>> I verified that the features coming from text.Vectorizer are
>> normalized; they're all in the range [-1, 1].
>>
> I guess that is not the problem here but chi2 is only defined for
> positive input, right?

Strictly, yes, so I should have turned idf weighting off, but chi²
seems to be handling negative numbers without crashing. After chi²,
all feature values are in the range [0, 1], as can be expected. I've
been planning to look into tfidf for quite some time, to see whether I
can get it to always return positive numbers (there's been some
discussion on the ML about this before and there's an open issue for
it).

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to