Lars Buitinck <larsmans@...> writes:

> The way to combine HV and
> Tfidf is
> 
> hashing = HashingVectorizer(non_negative=True, norm=None)
> tfidf = TfidfTransformer()
> hashing_tfidf = Pipeline([("hashing", hashing), ("tidf", tfidf)])
> 

I notice your use of the non_negative option in HashingVectorizer(), when
following hashing with TF-IDF.

Since using non_negative eliminates some information, I am curious whether 
there is any harm to allowing negative values as inputs to the TF-IDF 
function. In the general case, feature values whether positive or negative 
should simply scale up based on how document-infrequent they are, so I don't 
see the harm of allowing negative values.

-Apu



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to