2013/3/12 Raj Arasu <[email protected]>: > I am new to the "hashing trick" in general, but should I expect to get the > same coefficient matrix when training a BernoulliNB model using a > DictVectorizer versus a FeatureHasher as feature extractors? I am getting > different coefficient matrixes.
No, you will most likely not get the same coef_, except by rare coincidence. The order of columns will be different, their number will probably be different (corresponding to the FeatureHasher's n_features), some columns may correspond to multiple input features, and other columns may not correspond to anything (because they're slack columns). This is the downside to the hashing trick: models will be harder to interpret and hard to compare with the ones created without hashing. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
