2013/4/9 Terry Peng <[email protected]>: > Hi all, > > From HashingVectorizer's document, it said: > > - there is no way to compute the inverse transform (from feature indices > to > string feature names) which can be a problem when trying to introspect > which features are most important to a model. > > but i'm wondering if i can keep the mapping somewhere else to do the inverse > transform? e.g. > i can just get the indices from > hashingvectorizer.transform([text]).nonzero() and then get the > words from text or pass a dictionary to hashingvectorizer.transform to make > sure words/indices are > in consistent order. > > one problem with it is there can be collisions, so different words can map > to same indices, but > i think it's quite rare, especially if only want to get the most important > feature from single document.
In case of collision, there is no way to tell apart which feature string is the most frequent and which are the rare events that collide with the frequent. We need to add a tracing mode to HashingVectorizer to be able to implement such inverse transform correctly. -- Olivier ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
