2013/4/9 Terry Peng <[email protected]>:
> Hi all,
>
> From HashingVectorizer's document, it said:
>
>     - there is no way to compute the inverse transform (from feature indices
> to
>       string feature names) which can be a problem when trying to introspect
>       which features are most important to a model.
>
> but i'm wondering if i can keep the mapping somewhere else to do the inverse
> transform? e.g.
> i can just get the indices from
> hashingvectorizer.transform([text]).nonzero() and then get the
> words from text or pass a dictionary to hashingvectorizer.transform to make
> sure words/indices are
> in consistent order.
>
> one problem with it is there can be collisions, so different words can map
> to same indices, but
> i think it's quite rare, especially if only want to  get the most important
> feature from single document.

In case of collision, there is no way to tell apart which feature
string is the most frequent and which are the rare events that collide
with the frequent. We need to add a tracing mode to HashingVectorizer
to be able to implement such inverse transform correctly.

--
Olivier

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to