2013/3/12 Raj Arasu <[email protected]>:
> I am new to the "hashing trick" in general, but should I expect to get the
> same coefficient matrix when training a BernoulliNB model using a
> DictVectorizer versus a FeatureHasher as feature extractors?  I am getting
> different coefficient matrixes.

No, you will most likely not get the same coef_, except by rare
coincidence. The order of columns will be different, their number will
probably be different (corresponding to the FeatureHasher's
n_features), some columns may correspond to multiple input features,
and other columns may not correspond to anything (because they're
slack columns). This is the downside to the hashing trick: models will
be harder to interpret and hard to compare with the ones created
without hashing.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to