Re: [Scikit-learn-general] Different BernoulliNB coef matrix when using DictVectorizer vs FeatureHasher

Lars Buitinck Tue, 12 Mar 2013 04:24:28 -0700

2013/3/12 Raj Arasu <[email protected]>:
> I am new to the "hashing trick" in general, but should I expect to get the
> same coefficient matrix when training a BernoulliNB model using a
> DictVectorizer versus a FeatureHasher as feature extractors?  I am getting
> different coefficient matrixes.


No, you will most likely not get the same coef_, except by rare
coincidence. The order of columns will be different, their number will
probably be different (corresponding to the FeatureHasher's
n_features), some columns may correspond to multiple input features,
and other columns may not correspond to anything (because they're
slack columns). This is the downside to the hashing trick: models will
be harder to interpret and hard to compare with the ones created
without hashing.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Different BernoulliNB coef matrix when using DictVectorizer vs FeatureHasher

Reply via email to